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TITLE OF THE INVENTION 

CHITIN SYNTHASE 1 

5 Field of the invention 

This invention relates generally to the field of 
gene expression and specifically to genes essential for 
growth and to a vector and a method for the iden- 
tification of such genes, as well as identification of 

10 eukaryotic promoters. 

Background of the Invention 
Many eukaryotic genes are regulated in an 
inducible, cell type-specific or constitutive manner. 
There are several types of structural elements which are 

15 involved in the regulation of gene expression. There are 
cis-acting elements, located in the proximity of, or 
within, genes which serve to bind sequence-specific DNA 
binding proteins, as well as trans-acting factors. The 
binding of proteins to DNA is responsible for the initia- 

20 tion, maintenance, or down-regulation of transcription of 
genes. 

The cis-acting elements which control genes are 
called promoters, enhancers or silencers. Promoters are 
positioned next to the start site of transcription and 

25 function in an orientation-dependent manner, while enhan- 
cer and silencer elements, which modulate the activity of 
promoters, are flexible with respect to their orientation 
and distance from the start site of transcription. 

For many years, various drugs have been tested for 

30 their ability to alter the expression of genes or the 

translation of their messages into protein products. One 
problem with existing drug therapy is that it tends to 
act indiscriminately on genes and promoters and therefore 
affects healthy cells as well as neoplastic cells. Likew- 

35 ise, in the case of a pathogen-associated disease, it is 
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critical to administer a pathogen-specific therapy to 
avoid any detrimental effect on the non-infected cells. 

Chitin, a linear 0-1,4 linked polymer of N-acetyl- 
glucosamine, is present in the cell walls of all true 
5 fungi, but is absent from mammalian cells, studies in S. 
cerevisiae (reviewed in Bulawa, C. , Mol. Cell. Biol. 
12:1764, 1992; Cabib et al . , Arch. Med. .Res., 21:301, 
1993) have shown that the synthesis of chitin is s- 
urprisingly complex, requiring at least three isozymes 
10 encoded by the CHS1, CHS2 , and CSD2 genes, in cell-free 
extracts, all of the isozymes catalyze the formation of 
chitin using UDP-N-acetylglucosamine as the substrate. In 
cells, each isozyme makes chitin at a unique location in 
the cell during a specified interval of the cell cycle. 
15 Genetic analyses indicate that CHS 2 is involved in the 
synthesis of the chitin-rich primary septum that separa- 
tes mother and daughter cells, CSD2 is required for syn- 
thesis of the chitin rings, and CHSl plays a role in cell 
wall repair. Thus, the three isozymes are not functional- 
20 ly redundant and do not substitute for one another. 

Chitin synthase genes have been identified from a 
diverse group of fungi, and analysis of the deduced amino 
acid sequences of these genes has lead to the 
identification of two chitin synthase gene families 
25 (Bowen, et al . , Proc. Natl. Acad. Sci., USA, 89:519, 
1992) . Members of one family are related to the S. c- 
erevisiae CHS genes (CHS family) . Based on sequence 
analyses, the CHS family can be subdivided into classes 
I, II, and III. Members of the second family are related 
30 to the S. cerevisiae CSD2 gene. 

The functions of class II CHS genes have been 
investigated in a number of fungi by gene disruption. In 
S. cerevisiae, the class II CHS mutant (designated chs2) 
is defective in cell separation (Bulawa and Osmond, Proc. 
35 Natl. Acad. Sci., USA, 87:7424, 1990; Shaw et al., J. 
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Cell Biol., 114 (1) 1990). In A. nidulans (Yanai et 
al., Biosci. 58 flO) :1828, 1994) and U. maydis (Gold and K- 
ronstad, Molecular Microbiology , 11151:897, 1994), class 
II CHS mutants (designated chsA and chsl, respectively) 
5 have no obvious phenotype. Thus, all of the class II CHS 
genes studied to date are nonessential for growth. In 
addition, Young, et al. identified chitin synthase gene 
which encodes only part of the chitin synthase activity 
in C. albicans (Molec. Micro., 4(2) :197, 1990). 

10 There have been methods designed to identify 

virulence genes of microorganisms involved in 
pathogenesis. For example, Osbourn, et al. utilized a 
promoter-probe plasmid for use in identifying promoters 
that are induced in vivo in plants by Xanthomonas 

15 campestris (EMBO, J. 6:23, 1987). Random chromosomal DNA 
fragments were cloned into a site in front of a promoter- 
less chloramphenicol acetyltransf erase gene contained in 
the plasmid and the plasmids were transferred into Xan- 
thomonas to form a library .. Individual transcon jugates 

20 were introduced into chloramphenicol-treated seedlings to 
determine whether the transconjugate displayed resistance 
to chloramphenicol in the plant. 

Knapp, et al . , disclosed a method for identifying 
virulence genes based on their coordinate expression with 

25 other known virulence genes under defined laboratory con- 
ditions (J. Bacteriol., 170:5059, 1988). Mahan, et al . , 
(U.S. Patent No. 5,434,065) described an in vivo genetic 
system to select for microbial genes that are specifical- 
ly induced when microbes infect their host. The method 

30 depends on complementing the growth of an auxotrophic or 
antibiotic sensitive microorganism by integrating an 
expression vector by way of homologous recombination into 
the auxotrophic or antibiotic sensitive microorganism 1 s 
chromosome and inducing the expression of a synthetic 

35 operon which encodes transcripts, the expression of which 
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are easily monitored in vitro following in vivo 
complementation. 

These systems all describe methods of identifying 
genes involved in pathogenesis in bacterial-host systems. 
There is a need to identify specific targets of e- 
ukaryotic pathogens, e.g., fungi, in an infected cell 
which are associated with the expression of genes whose 
expression products are implicated in disease, in order 
to increase efficacy of treatment of infected cells and 
to increase the efficiency of developing drugs effective 
against genes essential for survival of these pathogens. 

The present invention provides a method for iden- 
tifying targets essential for growth as well as specific 
targets identified by the method. 

Summary of the Invention 
The present invention provides a yeast chitin 
synthase (CHSl) polypeptide and a polynucleotide encoding 
the polypeptide, in the present invention, the class II 
CHS gene of c. albicans (encoded by the CHSl gene) is 
shown to be essential for growth under laboratory con- 
ditions and for colonization of tissues during infection 
in vivo. Thus, CHSl is a target for the development of 
antifungal drugs. 

CHSl inhibitors are useful for inhibiting the 
25 growth of a yeast. Such CHSl inhibitory reagents 

include, e.g., anti-CHSl antibodies and CHSl antisense 
molecules. 

CHSl can be used to determine whether a compound 
affects (e.g., inhibits) CHSl activity, by incubating the 
compound with CHSl polypeptide, or with a recombinant 
cell expressing CHSl, under conditions sufficient to 
allow the components to interact, and then determining 
the effect of the compound on CHSl activity or expres- 
sion. 



20 



30 



SNSDOCID: «WO_9716S40A1J. 



WO 97/16540 



PCT/US96/17459 



-5- 

The invention also provides a vector for iden- 
tifying a eukaryotic regulatory polynucleotide, including 
a selectable marker gene; a restriction endonuclease 
site located at the_ 5' terminus of the selectable marker 
5 gene where a regulatory polynucleotide can be inserted to 
be operably linked to the selectable marker gene; and a 
polynucleotide for targeted integration of the vector 
into the chromosome of a susceptible host. Preferably, 
the eukaryotic regulatory polynucleotide is a promoter 

10 region, and most preferably, a promoter region of 

pathogenic yeast such as Candida albicans. The vector of 
the invention is preferably transferred to a library of 
host cells, wherein each host cell contains the vector. 

The vector of the invention can be used to iden- 

15 tify a eukaryotic regulatory polynucleotide. The method 
involves inserting genomic DNA of a eukaryotic organism 
into the vector, wherein the DNA is in operable linkage 
with the selectable marker gene; transforming a suscep- 
tible host with the vector; detecting expression of the 

20 selectable marker gene, wherein expression is indicative 
of operable linkage to a regulatory polynucleotide; and 
identifying the regulatory polynucleotide. 

The vector of the invention also can be used to 
identify a composition which affects the regulatory DNA 

25 (promoter) . The method involves incubating the com- 
position to be tested and the promoter, under conditions 
sufficient to allow the promoter-containing vector of the 
invention and the composition to interact, and then 
measuring the effect the composition has on the promoter. 

30 The observed effect on the promoter may be either 
inhibitory or stimulatory. 

The method of the invention is useful for iden- 
tification of promoters from any eukaryote. Particularly 
preferred eukaryotes are fungal pathogens including, but 

35 not limited to, Candida albicans, Rhodotorula sp., Sac- 
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charomyces cerevisiae, Blastoschizomyces capitatus, His- 
toplasma capsulatum, Aspergillus fumigatus, Coccidioides 
inunitis, Paracoccidioides brasiliensis, Blastomyces der- 
matitidis, and Cryptococcus neoformans. 

The invention also features a regulatory 
polynucleotide (a promoter) isolated using a library of 
host cells containing the vector of the invention; the 
promoter is a maltose responsive promoter (MRP) , which is 
induced by maltose and repressed by glucose. MRP is 
useful for determining whether a polynucleotide encodes a 
growth-associated polypeptide; the method involves in- 
cubating a cell containing the polynucleotide operably 
linked with the MRP, under conditions which repress the 
regulatory polynucleotide, and then determining the ef- 
fect of the expression of the 
polynucleotide on the growth of the cell. 

Brief Description of i-ho r»- a ,.»^ 
Figure la is a comparison of CHSl clones. 
Figure lb-g is the nucleotide (SEQ ID NO'l 
corresponds to the coding strand and the sequence of SEQ 
ID NO: 3 is complementary to the coding strand) and 
deduced amino acid sequence (SEQ id NO.:2) of Chitin 
Synthase (CHSl) isolated from Candida albicans. 

Figure 2a is a restriction map of the vector 
pBluescript® n ks (+/-). 

Figure 2b is a restriction map of the vector 

PVGCA2. 

Figure 3a-b is the nucleotide sequence (SEQ id 
NO: 4) of the maltose responsive promoter (MRP) from C 
albicans ("X" represents A, G, c, or T/U) . 

Figure 4 is a schematic illustration showing 
regulated expression of CHSl operatively linked to MRP. 

Figure 5 is a schematic illustration showing the b- 
idirectional regulation capability of MRP. 
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Figure 6 is a restriction map of the pKW044 vector 
including the CHS1 gene. 

Figure 7 is a demonstration of gene inactivation 
during infection by MRP, Panels A and B show neutropenic 
5 and Panels C and D show immunocompetent mice infected 
with the indicated strains of C. albicans. 

Detailed Description 
The invention provides genes essential for growth, 
such as the chit in synthase gene from Candida albicans 
10 (CaCHSl) , as well as vectors for identification of 

eukaryotic promoters. Preferably, the vector is used for 
the identification of promoters of fungal pathogens such 
as Candida albicans. The vectors allow identification of 
promoters and genes under the control of such promoters, 
15 many of which are involved in the infection process. A 
maltose responsive promoter (MRP) is provided as an 
example of a promoter isolated using the vector of the 
invention. 

Identification of a yeast gene essential for cell growth 
20 The invention provides a substantially pure chitin 

synthase (CHS1) polypeptide. The term "substantially 
pure" as used herein refers to CHS1 which is substantial- 
ly free of other proteins, lipids, carbohydrates or other 
materials with which it is naturally associated. One 
25 skilled in the art can purify CHSl using standard techni- 
ques for protein purification. The substantially pure 
polypeptide will yield a single major band on a non- 
reducing polyacrylamide gel. The purity of the CHSl 
polypeptide can also be determined by amino-terminal 
30 amino acid sequence analysis. CHSl polypeptide includes 
functional fragments of the polypeptide, provided that 
the activity of CHSl remains. Smaller peptides 
containing the biological activity of CHSl are also 
included in the invention. 
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The invention also provides polynucleotides en- 
coding the CHSl protein. These polynucleotides include 
DNA, cDNA and RNA sequences which encode CHSl. It is 
understood that all polynucleotides encoding all or a 
5 portion of CHSl are also included herein, as long as they 
encode a polypeptide with CHSl activity. Such 
polynucleotides include naturally occurring, synthetic, 
and manipulated polynucleotides. For example, CHSl 
polynucleotide may be subjected to site-directed 
10 mutagenesis. 

The polynucleotide sequence for CHSl can be used 
to produce antisense sequences as well as sequences that 
are degenerate as a result of the degeneracy of the 
genetic code; there are 20 natural amino acids, most of 
15 which are specified by more than one codon. Therefore, 
all degenerate nucleotide sequences are included in the 
invention, provided the amino acid sequence of CHSl 
polypeptiae encoded by the nucleotide sequence is 
functionally unchanged. 
20 Specifically disclosed herein is the yeast CHSl 

gene, more specifically, the Candida albicans CHSl gene. 
The sequence is 3084 base pairs long and contains an open 
reading frame encoding a polypeptide 1027 amino acids in 
length and having a molecular weight of about H6kD as 
25 determined by reducing SDS-PAGE. 

Preferably, the C. albicans CHSl nucleotide se- 
quence is SEQ ID NO: 1 and the deduced amino acid sequence 
is SEQ ID NO: 2 (Figure lb-g) . 

The polynucleotide encoding CHSl includes SEQ id 
30 NO:l as well as nucleic acid sequences capable of 

hybridizing to SEQ ID NO:i under stringent conditions, a 
complementary sequence may include an antisense 
nucleotide. When the sequence is RNA, the 
deoxynucleotides A, G, c, and T of SEQ ID N0:l are 
35 replaced by ribonucleotides A, G, c, and U, respectively. 
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Also included in the invention are fragments of the 
above-described nucleic acid sequences that are at least 
15 bases in length, which is sufficient to permit the 
fragment to selectively hybridize to DNA that encodes the 
5 protein of SEQ ID NO: 2 under stringent physiological con- 
ditions. 

The CHSl polypeptide of the invention can be used 
to produce antibodies which are immunoreactive with or 
which 

10 specifically bind to epitopes of the CHSl polypeptide. 
As used herein, the term "epitope" means any antigenic 
determinant of an antigen to which an antibody to the 
antigen binds. 

Antibodies can be made to the protein of the 

15 invention, including monoclonal antibodies, which are 
made by methods well known in the art (Kohler, et al., 
Nature, 256:495, 1975; Current Protocols in Molecular 
Biology, Ausubel, et al., ed. , 1989). 

The term "antibody" as used in this invention 

20 includes intact molecules as well as fragments thereof, 
such as Fab, F(ab')2, and Fv which are capable of binding 
the epitopic determinant. These antibody fragments 
retain the ability to selectively bind with its antigen 
or receptor and are defined as follows: (1) Fab, the 

25 fragment which contains a monovalent antigen-binding 
fragment of an antibody molecule can be produced by 
digestion of whole antibody with the enzyme papain to 
yield an intact light chain and a portion of one heavy 
chain; (2) Fab', the fragment of an antibody molecule 

30 can be obtained by treating whole antibody with pepsin, 
followed by reduction, to yield an intact light chain and 
a portion of the heavy chain; two Fab' fragments are 
obtained per antibody molecule; (3) (Fab') 2, the 
fragment of the antibody that can be obtained by treating 

35 whole antibody with the enzyme pepsin without subsequent 
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reduction; F(ab<)2 is a dimer of two Fab' fragments held 
together by two disulfide bonds; (4) Fv, defined as a 
genetically engineered fragment containing the variable 
region of the light chain and the variable region of the 
5 heavy chain expressed as two chains; and (5) Single 
chain antibody ("SCA"), defined as a genetically 
engineered molecule containing the variable region of the 
light chain, the variable region of the heavy chain, 
linked by a suitable polypeptide linker as a genetically 
10 fused single chain molecule. Methods of making these 
fragments are known in the art. (See for example, Harlow 
and Lane, Antibodies: A Laboratory Manual, Cold Spring 
Harbor Laboratory, New York (1988), incorporated herein 
by reference) . 

15 Antibodies which bind to the CHS1 polypeptide of 

the invention can be prepared using an intact polypeptide 
or fragments containing small peptides of interest as the 
immunizing antigen. The polypeptide or a peptide used to 
immunize an animal can be derived from 
20 transcribed/ translated cDNA or chemical synthesis, and 
can be conjugated to a carrier protein, if desired. Such 
commonly used carriers which can be chemically coupled to 
the peptide include keyhole limpet hemocyanin (KLH) , 
thyroglobulin, bovine serum albumin (BSA) , and tetanus 
25 toxoid. The coupled peptide is used to immunize the 
animal (e.g., a mouse, a rat, or a rabbit). 

It is also possible to use the anti-idiotype tech- 
nology to produce monoclonal antibodies which mimic an 
epitope. For example, an anti-idiotypic monoclonal 
30 antibody made to a first monoclonal antibody will have a 
binding domain in the hypervariable region which is the 
••image" of the epitope bound by the first monoclonal 
antibody. 

The invention also provides a method for 
35 inhibiting the growth of yeast, by contacting the yeast 



^NSDOCID: <WO__9716540A1J_> 



WO 97/16540 



PCT/US96/17459 



- 11 - 

with a reagent which suppresses CHS1 activity. 
Preferably the yeast is 
C. albicans. 

Where a disease or disorder is associated with the 
5 production of CHS1 (e.g., a yeast infection), nucleic 
acid sequences that interfere with CHS1 expression at the 
translational level can be used to treat the infection. 
This approach utilizes, for example, antisense nucleic 
acids, ribozymes, or triplex agents to block transcrip- 

10 tion or translation of CHS1 mRNA, either by masking that 
mRNA with an antisense nucleic acid or triplex agent or 
by cleaving it with a ribozyme. 

Antisense nucleic acids are DNA or RNA molecules 
that are complementary to at least a portion of a 

15 specific mRNA molecule (Weintraub, Scientific American, 
262:40, 1990). In the cell, the antisense nucleic acids 
hybridize to the corresponding mRNA, forming a double- 
stranded molecule. The antisense nucleic acids interfere 
with the translation of the mRNA, as the cell will not 

20 translate a mRNA that is double-stranded. Antisense 
oligomers of about 15 nucleotides are preferred, since 
they are easily synthesized and are less likely to cause 
problems than larger molecules when introduced into the 
CHSl-producing cell (e.g., a Candida albicans). The use 

25 of antisense methods to inhibit the in vitro translation 
of genes is well known in the art (Marcus-Sakura, A- 
nal .Biochem. , 172 :289. 1988). 

Use of an oligonucleotide to block transcription 
is known as the triplex strategy; the oligomer winds 

30 around double-helical DNA, forming a three-strand helix. 
These triplex compounds can be designed to recognize a 
unique site on a chosen gene (Maher, et al., Antisense 
Res. and Dev., 1121:227, 1991; Helene, C, Anticancer 
Drug Design, 6(6) :569, 1991). 
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The reagent used for inhibition of the growth of 
yeast by suppression of CHSl activity can be an anti-CHSl 
antibody. Addition of such an antibody to a cell or 
tissue suspected of containing a yeast, such as C. 
5 albicans, can prevent cell growth by inhibiting cell wall 
formation. 

The invention also provides a method for detecting 
a yeast cell in a host tissue, for example, which 
comprises contacting an anti-CHSl antibody or CHSl 
10 polynucleotide with a cell having a yeast-associated 
infection and detecting binding to the antibody or 
hybridizing with the polynucleotide, respectively. The 
antibody or polynucleotide reactive with CHSl or DNA 
encoding CHSl is labeled with a label which allows 
15 detection of binding or hybridization to CHSl or the DNA. 
An antibody specific for CHSl polypeptide or 
polynucleotide specific for CHSl polynucleotide may be 
used to detect the level of CHSl in biological fluids and 
tissues of a patient. 
20 The antibodies of the invention can be used, for 

example, in immunoassays in which they can be utilized in 
liquid phase or bound to a solid phase carrier. 

The anti-CHSl antibodies of the invention can be 
bound to a solid support and used to detect the presence 
25 of an antigen of the invention. Examples of well-known 
supports include glass, polystyrene, polypropylene, 
polyethylene, dextran, nylon, amylases, natural and 
modified celluloses, polyacrylamides , agaroses and 
magnetite. The nature of the carrier can be either 
30 soluble or insoluble for purposes of the invention. 
Those skilled in the art will know of other suitable 
carriers for binding antibodies, or will be able to 
ascertain such, using routine experimentation. 

The CHSl antibodies of the invention can be used 
35 m vitro and in vivo to monitor the course of 
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amelioration of a yeast-associated disease in a subject. 
Thus, for example, by measuring the increase or decrease 
in the number of cells expressing antigen comprising CHS1 
polypeptide of the invention or changes in the 
5 concentration of such antigen present in various body 
fluids, it is possible to determine whether a particular 
therapeutic regimen aimed at ameliorating the yeast- 
associated disease is effective. The term "ameliorate" 
denotes a lessening of the detrimental effect of the 
10 yeast-associated disease in the subject receiving 
therapy . 

The CHS1 of the invention is also useful in a 
screening method to identify compounds or compositions 
which affect the activity of the protein. To determine 

15 whether a compound affects CHS1 activity, the compound is 
incubated with CHS1 polypeptide, or with a recombinant 
cell expressing CHS1, under conditions sufficient to 
allow the components to interact; the effect of the 
compound on CHS1 activity or expression is then 

20 determined. 

The increase or decrease of chitin synthase 
transcription/translation can be measured by adding a 
radioactive compound to the mixture of components, such 
as 32 P-ATP or 35 S-Met, and observing radioactive 

25 incorporation into CHS1 transcripts or protein, 

respectively. Alternatively, other labels may be used to 
determine the effect of a composition on CHS1 
transcription/ translation. For example, a radioisotope, a 
fluorescent compound, a bioluminescent compound, a c- 

30 hemi luminescent compound, a metal chelator or an enzyme 
could be used. Those of ordinary skill in the art will 
know of other suitable labels or will be able to 
ascertain such, using routine experimentation. Analysis 
of the effect of a compound on CHS1 is performed by 

35 standard methods in the art, such as Northern blot 
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analysis (to measure gene expression) or SDS-PAGE (to 
measure protein product) , for example. Further, CHSl 
enzymatic activity can also be determined, for example, 
by incorporation of labeled precursor of chitin. 
5 Preferably, such precursor is UDP-N-acetylglucoseamine. 
Vector for identification of a mil^ry o tic Eggm atggy. 
polynucleotide 

The vector contains at least one promoterless 
selectable marker gene and a restriction endonuclease 
10 cloning site located at the 5' terminus of the selectable 
marker. A pool of chromosomal DNA fragments from a 
eukaryotic organism is inserted at the restriction 
endonuclease cloning site in operable linkage with the 
selectable marker polynucleotide, in addition, the 
15 vector contains a polynucleotide sequence for targeted 
integration of the vector into the chromosome of a 
susceptible host. 

As used herein, the term "vector" refers to a 
nucleic acid molecule capable of transporting another 
nucleic acid, to which it has been operatively linked, 
from one genetic environment to another. 

The term "regulatory polynucleotide" as used 
herein preferably refers to a promoter, but can also 
include enhancer elements. The vectors of the invention 
contain a promoterless selectable marker gene having a 
cloning site at the 5' terminus of the gene. The vectors 
also include a cloning site 5' of the selectable marker 
gene, which is operably associated with a promoter. The 
term "operably associated" or 'operably linked" refers to 
functional linkage between the promoter sequence and the 
controlled nucleic acid 

sequence; the sequence and promoter are typically 
covalently joined, preferably by conventional 
phosphodiester bonds. 
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The expression vectors of the invention employ a 
promoterless gene for selection of a promoter sequence • 
The vectors contain other elements typical of vectors, i- 
ncluding an origin of replication, as well as genes which 
5 are capable of providing phenotypic selection of 

transformed cells. The transformed host cells can be 
grown in the appropriate media and environment, e.g., in 
fermentors, and cultured according to techniques known in 
the art to achieve optimal cell growth. The vectors of 

10 the present invention can be 

expressed in vivo in either prokaryotes or eukaryotes. 
Methods of expressing DNA sequences containing eukaryotic 
coding sequences in prokaryotes are well known in the 
art. Biologically functional plasmid DNA vectors used to 

15 incorporate DNA sequences of the invention for expression 
and replication in the host cell are described herein. 
For example, DNA can be inserted into yeast cells using 
the vectors of the invention. Various shuttle vectors for 
the expression of foreign genes in yeast have been 

20 reported (Heinemann, et al., Nature, 340:205, 1989; Rose, 
et al., Gene, 60:237, 1987). 

Host cells include microbial, yeast, and mammalian 
cells, e.g., prokaryotes and eukaryotes such as yeast, 
filamentous fungi, and plant and animal cells. 

25 Transformation or transfection with recombinant 

DNA may be carried out by conventional techniques well 
known to those skilled in the art. Where the host is 
prokaryotic, such as E. coli, competent cells which are 
capable of DNA uptake can be prepared from cells 

30 harvested after the exponential growth phase and subse- 
quently treated, i.e., by the CaCl 2 method using 
procedures well known in the art. 

Where the host cell is eukaryotic, various methods 
of DNA transfer can be used. These include transfection 

35 of DNA by calcium phosphate-precipitates, conventional 
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mechanical procedures such as Microinjection, insertion 
of a plasmid encased in liposomes, spheroplast 
electroporation, salt mediated transformation of 
unicellular organisms, or the use of viral vectors a 
5 library of host cells, wherein each host cell contains a 
vector according to the description above, is also 
included in the invention. 

Eukaryotic DNA can be cloned into prokaryotes 
using vectors well known in the art. Because there are 
10 many functions in eukaryotic cells which are absent in 
prokaryotes, (e.g., localization of ATP-generating 
systems to mitochondria, association of DNA with histone 
mitosis and meiosis, and differentiation of cells) the ' 
genetic control of such functions must be assessed'in a 
15 eukaryotic environment. Many eukaryotic vectors, though 
are capable of replication in coli, which is impo^ 
for amplification of the vector DNA. Thus, vectors 
preferably contain markers, e.g., LEU 2, HIS 3, Ura 3 
that can be selected easily in yeast, and in addition,' 
20 also carry antibiotic resistance markers for use in * 

cola. The selectable marker gene, which lies immediately 
downstream from the cloning site, preferably encodes a 
biosynthetic pathway enzyme of a eukaryote which relies 
on the enzyme for growth or survival. This biosynthetic 
pathway gene, once activated, will complement the growth 
of an auxotrophic host, deficient for the same biosy- 
nthetic pathway gene in which it is integrated. 
Typxcally, genes encoding amino acid biosynthetic enzymes 
are utilized, since many strains are available having at 
least one of these mutations, and transformation events 
are easily selected by omitting the amino acid from the 
medium. Examples of markers include but are not limited 
to URA3, URA3-hisG, LEU2, LYS2, HIS3, H1S4, TRP1, ARG4 
Hgm , and TUN *. Preferably, the vector includes a 
35 promoterless URA3 gene. Expression of the c. albicans 
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URA 3 gene is required for the infection process, thus 
creating a strong selection pressure for those sequences 
cloned upstream of the promoter less URA3 gene that will 
be induced during the infection process. 

5 The vector of the invention preferably includes a 

prokaryotic origin of replication or replicon, i.e., a 
DNA sequence having the ability to direct autonomous 
replication and maintenance of the recombinant DNA 
molecule extra-chromosomally in a transformed prokaryotic 

10 host cell. Such origins of replication are well known in 
the art; preferred origins of replication are those that 
are efficient in the host organism, e.g., the preferred 
host cell, E. coli. For vectors used in E. coli, a 
preferred origin of replication is ColEl, which is found 

15 in pBR322 and a variety of other common plasmids. Also 
preferred is the pl5A origin of replication found on 
pACYC and its derivatives. The ColEl and pl5A replicon 
have been extensively utilized in molecular biology, are 
available on a variety of plasmids, and are described, 

20 e.g., in Sambrook, et al . , Molecular Cloning: a 
Laboratory Manual, 2nd edition, Cold Spring Harbor 
Laboratory Press, 1989) . 

The ColEl and pl5A replicons are particularly 
preferred for use in the invention because they each have 

25 the ability to direct the replication of a plasmid in E. 
coli while the other replicon is present in a second 
plasmid in the same E. coli cell. In other words, ColEl 
and pl5A are non- interfering replicons that allow the 
maintenance of two plasmids in the same host (see, for 

30 example, Sambrook, et al., supra, at pages 1.3-1.4). 

The vector of the invention includes a polylinker 
multiple cloning site for insertion of selectable marker 
genes. A sequence of nucleotides adapted for directional 
ligation, i.e., a polylinker, is a region of the DNA 

35 expression vector that (l) operatively links for 
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replication and transport the upstream and downstream 
translatable DNA sequences, and (2) provides a site for 
directional ligation of a DNA sequence into the vector. 
Typically, a directional polylinker is a sequence of 
5 nucleotides that defines two or more restriction 

endonuclease recognition sequences. Upon restriction 
cleavage, the two sites yield cohesive termini to which a 
translatable DNA sequence can be ligated to the DNA 
expression vector. Preferably, the two restriction sites 
10 provide, upon restriction cleavage, cohesive termini 
that are non-complementary and thereby permit directional 
insertion of a translatable DNA sequence into the 
cassette. Where the sequence of nucleotides adapted for 
directional ligation defines numerous restriction sites, 
15 it is referred to as a multiple cloning site. 

Additionally, the vector may contain a 
phenotypically selectable marker gene to identify host 
cells which contain the expression vector. Examples of 
markers typically used in prokaryotic expression vectors 
include antibiotic resistance genes for ampicillin (p- 
lactamases), tetracycline and chloramphenicol (chloram- 
phenicol acetyltransferase) . 

The vector contains a polynucleotide sequence for 
targeted integration of the vector into the chromosome of 
25 a susceptible host. Targeted integration, as opposed to 
random integration, results in more stable transformants 
and avoids position effects or integration into genes 
required for growth and infection. Preferably, the gene 
for targeted integration is also a selectable marker, 
thereby allowing the identification of transformants that 
contain the vector. Such genes include the adenine 
biosynthesis (ADE2) gene of Candida albicans. A suscep- 
tible host is a host having a site recognized by the 
polynucleotide of the vector for targeted integration. 
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Promoters identified by the method of the 
invention can be inducible or constitutive promoters. 
Inducible promoters can be regulated, for example, by 
nutrients (e.g., carbon sources, nitrogen sources, and 
5 others), drugs (e.g., drug resistance) , environmental 
agents that are specific for the infection process (e.g., 
serum response), and temperature (e.g., heat shock, cold 
shock) . 

T^ifirAtinn of a ^iVarvotic regulatory polynucleotide 
10 T he selection method of the invention utilizes an 

auxotrophic organism, or an organism that has a mutation 
in a biosynthetic pathway gene encoding a functional 
biosynthetic enzyme necessary for the growth of the 
organism. When a functional or wild-type copy of a 
15 biosynthetic pathway gene is inserted into an auxotroph, 
the expression of the wild-type biosynthetic pathway gene 
provides the auxotroph with the biosynthetic enzyme 
required for growth or survival. The process of 
replacing a missing or non-functional gene of an 
20 auxotroph with a functional homologous gene in order to 
restore the auxotroph' s ability to survive within a host 
cell is called " complementation" . 

Complementation of the auxotroph, according to the 
present invention, is accomplished by construction of a 
25 vector having a promoterless structural gene encoding a - 
biosynthetic enzyme, i.e., a selectable marker 
polynucleotide, as described above. The cloning site for 
the promoter of interest is at the 5' terminus of the 
structural gene encoding the biosynthetic enzyme. 
30 Consequently, a promoter region operatively linked to any 
gene or set of genes will control the expression of that 
gene or genes. In order to be controlled by the 
promoter, the gene must be positioned downstream from the 
promoter. 
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in «. StrUCtUral 96ne encodin 9 a biosynthetic e„ 2yine 

in the vector of the invention does not contain 
recognition sequences for regulatory factors to allow 
transcription of the structural gene. Consequently, the 
5 products encoded by the structural gene is not capable 
of being expressed unless a propter sequence is inserted 
into the cloning site 5' to the structural gene. 

A second structural gene in the vector allows for 
targeted insertion and integration into the host cell' s 
10 chromosomal DNA. Optionally, the vector may contain ad- 
ditional genes, such as those encoding selective markers 
for selection m bacteria. Typically drug resistance 
genes such as those described above are used for such 
selection. 

15 in the method of the invention, total genomic DNA 

is isolated from the organism, e.g., Candida alMcanSt 
and then partially enzymatically digested, resulting in a 
pool of random chromosomal fragments. The vector of the 
invention is cleaved at the restriction/cloning site, and 
20 m^ed with the cleaved chromosomal DNA. The chromosomal 
fragments are ligated into the vector to produce a 
library, i.e., each vector contains a random chromosomal 
fragment so that the pool of vectors is representative of 
the entxre organism's genome. The vectors containing the 
25 chromosomal fragments are then introduced into the host 
organism (e.g.,. an auxotrophic strain or drug resistant 
strain of Candida albicans) by methods well know in the 
art. For example, the vectors may be introduced by t- 
ransf ormat ion . 

30 After the vector is introduced into the host 

(e.g., auxotrophic), the vector may integrate into the 
auxotroph' s chromosome by targeted integration. This 
step can be detected by selection, as described above. 
For example, the preferred polynucleotide for targeted 

35 insertion and integration in Candida albicans is the ADE2 
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gene. The presence of this gene is detectable by growth 
of the organism on adenine deficient media. 

The expression of the biosynthetic enzyme gene, 
e.g., URA3, whether under constitutive or inducible con- 
5 ditions, is identified by complementation of a host cell 
strain in which the gene is defective or missing, e.g., 
URA3-. Only those host cells which can grow in medium 
lacking the nutritional supplement, e.g., uracil, will be 
expected to contain a cloned functional promoter 
10 sequence. 

Identification of a veast regulatory polynucleotide 
capable of induction and repression 

In another aspect, the invention provides an 
isolated regulatory polynucleotide, the MRP promoter, 

15 characterized in that it is induced by maltose and 
repressed by glucose. MRP of the invention is 
exemplified by the nucleotide sequence of SEQ ID NO: 4 
(Figure 3a-b) , wherein the sequence is 1734 base pairs in 
length. MRP was isolated from a promoter library based 

20 on expression of the Ura3 gene of C. albicans as 

described above. MRP functions bidirectionally , that is, 
genes flanking MRP both 5' and 3' are controlled by this 
regulatory polynucleotide. 

The MRP of the invention is useful for identifying 

25 genes which are essential for cell growth. Thus, the 
invention provides a method for determining whether a 
polynucleotide encodes a growth-associated polypeptide, 
by incubating a cell containing the polynucleotide 
operably linked with the MRP regulatory polynucleotide, 

30 under conditions which repress the regulatory 

polynucleotide, and determining the effect of the tested 
polynucleotide on the growth of the cell. 

MRP of the invention promotes transcription in the 
presence of maltose, while the ability of MRP to promote 
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transcription is repressed by glucose, a cell having a 
polynucleotide of interest operably linked to MRP can be 
grown on a glucose containing medium to determine whether 
the polynucleotide of interest is essential for cell 
5 growth. MRP is repressed on glucose, thus repressing 
transcription of the operably linked polynucleotide 
therefore, if a cell grown on a glucose containing-medium 
dies, the polynucleotide is determined to be essential 
for cell growth. 

10 MRP can be used to induce (maltose) or repress 

(glucose) expression of a gene operably linked to MRP. it 
is also envisioned that MRP may be useful for decreasing 
the expression of a target gene operably linked to MRP, 
such that the cell containing the MRP-gene of interest is 
now extremely sensitive to a compound of interest. For 
example, it may be desirable to increase susceptibility 
or resistance to a particular therapeutic compound. - 
Similarly, MRP is useful for inducing expression of a 
gene operatively linked to MRP, by growing a host cell 
containing a MRP-gene construct on a maltose-containing 
medium. it may be desirable to elevate gene expression 
for screening various therapeutic compounds for their 
effect on the gene product. 

The following examples are intended to illustrate 
25 but not limit the invention. While they are typical of 
those that might be used, other procedures known to those 
skilled in the art may alternatively be used. 
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EXAMPLES 
EXAMPLE 1 

ISOLATION OF CFTTTK SYNTgAgJ FROM r„n„ T „„ „k, 

Using Southern blotting, the restriction maps for 
the cloned CHSl gene contained in pJAIV and the genomic 
CHSl locus were produced, however, the maps were found 
not to match. Additional studies indicated that pJAIV 
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contained two nonadjacent genomic DNA fragments as 
diagrammed in FIGURE la. As a consequence, pJAIV lacked 
the 5' end of CHS1. To clone this region, a plasmid 
rescue strategy was employed. Plasmid pKW025, which 

5 contains a 600 bp KpnI/EcoRI fragment of CHS1, and a 1.4 
kb Candida URA3 gene cloned into pSK(-) , was cut with 
Clal and transformed into Candida albicans strain CAI-4. 
Transformants were examined by Southern blot and strain 
CAI-4A was identified, containing pKW025 integrated at 

10 the CHS1 locus. Genomic DNA was extracted from CA1-4A 
and cut with Hind III. Because pKW025 and the sequenced 
portion of CHS1 contain no Hind III sites, this digestion 
yields on a single DNA fragment pKW025 plus the genomic 
CHS1 locus with flanking regions extending to the 5' and 

15 3' Hind III sites. Ligation was carried out with a low 
DNA concentration to promote intramolecular ligation 
events, and the DNA transformed into E. coli. Recovered 
plasmids were screened by PCR to verify that they 
contained contiguous CHS1 sequence. 

20 Plasmid pKW030 (12 kb total) was identified and 

contained approximately 2 kb of CHS1 sequence upstream of 
the Xhol site. A 3.6 kb Hindlll/PstI fragment was cloned 
into the Hindlll/PstI sites of pSK(-), forming plasmid 
pKW032. The 3' region of the gene was derived from 

25 plasmid pKW013 (originally derived from pJA-IV) . A 3.5 kb 
BstEIl/NotI fragment was cloned into the BstEII/NotI 
sites of pKW032, forming plasmid pKW035. pKW035 was cut 
with various restriction enzymes, and Southern blot 
analysis also carried out to confirm that the insert was 

30 indeed an uninterrupted CHS1 gene whose restriction 
pattern matched that of the chromosomal CHS1. 

The insert was sequenced by standard methods and 
the nucleotide and deduced amino acid sequence are shown 
in Figure lb-g (SEQ ID NO:l and 2). 
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EXAMPLE 9 

COH8TOPCTTON OF PROMOTBP T «» mT ~ r ygggpg 
The Candida albicans URA3 gene was amplified by 
PCR and a Sail site was inserted next to the ATG. The 3' 
primer used contained a genomic *bal site. The Sall/xbal 
fragment was cloned in Bluescript KS + at Sall/xbal. The 
C albicans EcoM genomic fragment containing the ADE2 
gene was cloned in the above plasmid at the Xhol site of 
the Bluescript polylinker. 

The ca URA3 gene was amplified by PCR using the 
following primers: 

5' Primer URA3-ATG: 5 ' -GGAGGA [ GTCGAC ] ATjgACAGTCAACAg- 3 * 
(SEQ ID NO: 5) Sa21 



15 3' Primer URA3-XbaI: 5 > —CGCATTA A ar:p r TCTArea 1 RGAA£CA£g-3 ' 

Xbal 

(SEQ ID NO: 6) 
20 (Underlined regions: genomic) 

The PCR reaction was as follows: 

100 ng DNA, SOpmoles each primer, 2.5mM dNTP, 2.5mM Mg 
Cl 2 , 0.5U Taq Polymerase/ 100 (tl. 
Reaction ; 
25 step l: 2 min 94 °c 
step 2: l min 94 °C 
step 3: l min 57 °c 
step 4: n/2 min 72 °C 
step 5: steps 2-4 x 30 times 
30 step 6: io min 72 °C 
step 7: Hold 4»C 

For the cloning, 20 M l of the PCR reaction was run 
on 0.7% low melting agarose gel and the band was purified 
using the Promega (Madison, wi) PC R purification resin. 
35 The purified band and i m of Strategene KS + bluescript 
(Figure 2a; Stratagene, La Jolla, CA) were digested with 
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Sall and Xbal, gel isolated (as above) and eluted in 50 
Ml water. 

The ligation reaction was performed as follows: 
Ligation (20 ul) : 1 Ml vector, 10 fil digested PCR band, 2 

5 Ml T4 ligase buffer, 1 Ml (2 units) T4 ligase 

(Boehrringer) , 6 Ml H 2 0, over night at room temperature. 
10 Ml of the ligation was used to transform Strategene 
XLl Blue ultracompetent cells selecting for ampicillin 
resistance. Individual colonies were grown in LB+ 

10 ampicillin and plasmid DNA was isolated using the Quiagen 
(Chatsworth, CA) spin columns. 

The above plasmid was digested with Xhol , filled 
in with Klenow for 30 min and dephosporylated with acid 
phosphatase for 5 min. The band was gel purified as 

15 above. The EcoRV fragment containing the Ca ADE2 gene was 
cloned into the plasmid using the conditions described 
above (Figure 2b) . 

EXAMPLE 3 

isolation and Characteriz ation of a maltose 
20 induced /glucose repressed promoter of C. albicans 

Using the promoter probe vector pVGCAV2 (based on URA3 
expression) , a library was constructed which inserted 1-2 
kb Sau3A fragments (isolated by sucrose gradient 
centrifugation) upstream (5') of the promoter less URA3 
25 reporter gene into the vector. The vector plasmid was 
cut with Sail and partially end filled with dT and dc 
while the insert fragments (Sau3A cut) were partially 
filled in with dG and dA. These partial fill in 
reactions left 2 bp overhangs that are compatible for a 
30 ligation reaction. The results of the ligation of the 
library were introduced into E. coli strain DH5a by 
electroporation, and gave rise to 76,500 independent 
transformants. Sixteen randomly picked colonies all 
proved to have inserts indicating the library was sound. 
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The plasmid library was extracted from E. coli by 
standard plasmid isolation procedures and cut at the 
unique BamHI site within the ADE2 gene for targeted 
integration of the ADE locus of c. albicans strain CaI8 
5 (ade2ura3). The ade2 mutation of caie allows for 

selection of transformants and the ura3 mutation of Cal8 
permits monitoring of expression of the reporter gene 
URA3. A first pool of 10,000 independent CaI8 
transformants was tested for regulated VRA3 expression. 
10 The cais transformants were plated on Synthetic Dextrose 
[glucose medium (2% glucose (w/v) and yeast, nitrogen base 
without amino acids at 6.7 g/L (Difco) ) without uridine] 
to determine the frequency of transformants expressing 
the URA3 gene const itutively. Fourteen per cent of the 
15 Candida CaI8 transformants expressed varying levels of 
the URA3 gene as determined by the ability to form 
colonies on a medium lacking uridine supplementation. 
The pool was then treated with the compound 5-FOA to 
remove these transformants expressing the URA3 gene 
20 constitutively (transformants expressing URA3 convert 5- 
Fluoro-orotic acid to a toxic compound and thus can be 
eliminated from the pool) . To isolate promoters 
responding to specific carbon sources, aliquots of the 
pool were grown on synthetic glucose medium supplemented 
25 with uridine and replicated to synthetic maltose medium 
without uridine. Candida transformants able to produce 
colonies on the unsupplemented maltose medium putatively 
contained a maltose inducible promoter. Four strains 
(MRP-2, MRP-5, MRP-6, MRP-7) were shown to show maltose 
30 dependent growth that was repressed upon the addition of 
glucose. 

Chromosomal DNA was extracted from the Candida 
CaI8 transformants exhibiting maltose dependent growth 
(MRP strains) and digested with the restriction enzyme 
35 BamHl to 'release the MRP clones." The "released" 
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plasmids were ligated and introduced into E. coli by 
transformation. These E. coli transf ormants were used as 
a source of plasmid DNA for dideoxy/chain termination 
sequencing- Initial sequencing data using a primer to 

5 URA3 sequences just downstream of the insert (3') 

indicated all the MRP strains contained the same insert. 
Sequencing data obtained using a primer to ADE2 sequences 
(5' to the insert DNA with respect to URA3 transcription 
indicated the clone contained part of a maltase gene and 

10 regulatory sequences (Figure 3a-b, SEQ ID NO:4). The 
entire sequence of the clone was assembled and the 
portion of the maltase ORF contained on the insert was 
shown to be approximately 70% sequence identical to a 
previously cloned promoter of C. albicans maltase 

15 (CAMAL2) (Geber, et al w J. Bacteriology, 174:6992, 
1992) . 

EXAMPLE 4 

IDENTIFICATION OF GENES ESSENTIAL F OR YEAST CELL GROWTH 

This experiment used the MRP promoter as a gene 
20 disruption tool, and the C. albicans CHS1 gene. A strain 
was constructed and designated KWC340, in which CHS1 
expression is regulated by the carbon source present in 
the growth medium. Transcription of CHS1 was induced by 
maltose and repressed by glucose. In maltose containing 
25 medium, KWC340 grows at the same rate as a wild-type 

strain. When KWC340 is transferred to glucose-containing 
medium, cells stop growing and eventually die. Three 
generations after transfer to glucose, short chains of 
cells grow but fail to separate. Ten generations after 
30 transfer, growth has stopped. Long chains and clumps of 
cells are seen; a large percentage of the cells are 
anucleate or multinucleate, indicating a defect in 
nuclear segregation. Viability is reduced approximately 
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500-fold relative to a control culture, as judged by 
plating efficiency. 

As a first step in constructing a strain in which 
the sole functional CHSl gene was under the control of 
5 the MRP fragment, a vector was constructed in pKS termed 
KW044 with the following features (see Figure ) : 

(a) the plasmid contained URA3 for selection of 
transformants in the Ura-strains CaI4 (CHSl/CHSl) and 
167b (CHSl/chsl: :hisG) 

10 < b > a 1088 bP PCR fragment of the MRP sequence 

(see attached figure showing sites of PCR primers) 

(c) 1479 bp of the C. albicans CHSl N-terminus 
that contains a unique Xhol site to target the trans- 
formation/integration event. 
15 This construct fuses the ATG initiation codon of 

the CHSl gene at the same position as the VRA2 gene 
(original reporter gene used to isolate the MRP clone) 
with respect to the MRP fragment. Integration of this 
construct at the remaining wild-type CHSl allele in 
strain 167b places the sole functional CHSl gene under 
the control of the transcriptional control of the MRP 
fragment. After transformation this type of integrants 
were recovered as confirmed by Southern analysis. These 
integrants grew well on maltose containing medium 
(inducing conditions) but died when replicated to glucose 
containing medium. 

When injected into mice, the MRP-CHSl integrants 
were avirulent;the symptoms diagnostic of candidiasis 
were not observed, and the kidneys from the mice were 

30 sterile. Thus CHSl is essential for growth in vitro and 
in vivo. Briefly, ICR 4-week-old male mice (Harlan 
Sprague Dawley) were housed five per cage; food and water 
were given ad libitum according to the National 
institutes of Health guidelines for the ethical treatment 

35 of animals. Strains of C. albicans were grown in SM 
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medium [2% maltose, 0.7% yeast nitrogen base without 
amino acids (Difco Laboratories, Detroit, MI)] to a 
density of 10 7 cells/ml. Cells were harvested, washed, 
resuspended in sterile water, and injected into mice (10 6 
5 cells/ immunocompetent mouse, 10 4 cells/neutropenic mouse) 
via the lateral tail veins. For each strain of C. 
albicans, five mice were infected. Cages were checked 
three times daily for mice dead or moribund (exhibiting 
severe lethargy, vertigo, and ruffled fur) mice. 

10 Moribund mice were euthenized by cervical dislocation and 
necropsied. The left and right kidneys were removed and 
examined for colonization by C. albicans. In experiments 
using neutropenic mice, cyclophosphamide was administered 
(150 mg/kg) by intraperitoneal injection 96 and 24 hours 

15 prior to infection. Injections were repeated every three 
days for the duration of the experiment. Neutropenia was 
verified by comparing the percentage of neutrophils to 
total number of leukocytes before and after injection 
with cyclophosphamide. 

20 Figure 7, panels A-D, shows the results of the in 

vivo experiment. Neutropenic (panels A & B) and 
immunocompetent (panels C & D) mice were infected with 
the indicated strains of C. albicans: clinical isolate 
(strain SC5314, , panels A & C) ; MRP::URA3 (strain MRP2, 

25 a derivative of SC5314 containing one copy of URA3 which 
is regulated by MRP, panels A & C) ; MRP::CHS1 (strain 
KWC340, a derivative of SC5314 containing one copy of 
CHS1 which is regulated by MRP, a, panels B & D) ; and 
CHS1/MRP: :CHS1 (strain KWC352, a derivative of SC5314 

30 containing two copies of CHS1; one regulated by MRP, the 
other by the CHS1 promoter, O, panels B & D) . 

In conclusion, these results show the MRP clone 
controls the expression of two non cognate genes (CHS1 
and URA3) in a regulated manner and demonstrate the 

35 utility of the MRP sequence as a genetic tool in C. 
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albicans for target validation (determination of gene 
essentiallity) . 

Although the invention has been described with 
reference to the presently preferred embodiment it 
should be understood that various modifications can be 
made without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the 
following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 
(i) APPLICANT: CHEMGENICS PHARMACEUTICALS , INC. 

(ii) TITLE OF THE INVENTION: IDENTIFICATION OF EUKARYOTIC 

GROWTH-RELATED GENES AND PROMOTER ISOLATION 
VECTOR AND METHOD OF USE 

(iii) NUMBER OF SEQUENCES: 6 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson P.C. 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

(D) STATE: MA 

(E) COUNTRY: US 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ Version 2-0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 01-NOV-1996 

(C) CLASSIFICATION: . 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/551,437 

(B) FILING DATE: 01-NOV-1995 



(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Clark, Paul T. 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE / DOCKET NUMBER: 06286/009WO1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 617-542-5070 

(B) TELEFAX: 617-542-8906 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3084 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 1...3081 
(D) OTHER INFORMATION: 
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GAA GAT GCT AAA GAA AGC GAA TTT ATG GCT GCA ACC TCA AAG CTG AAT 
Glu Asp Ala Lys Glu Ser Glu Phe Met Ala Ala Thr Ser Lye Leu Asn 
85 90 



95 



110 



96 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NOtli 

ATG AAG AAT CCA TTT GAC AGT GGC AGT GAC GAT GAA GAT CCA TTT CTT 48 
Met Lys Asn Pro Phe Asp Ser Gly Ser Asp Asp Glu Asp Pro P»£ Leu 
5 10 15 

Ser En p^o CCA TCA ATG CCC TAC GCA GCA TAT TTC CCA 

Ser Asn Pro Gin Ser Ala Pro Ser Met Pro Tyr Ala Ala Tyr Phe Pro 

20 25 30 

CTG TCG ACT AGT GGA TCT CCA TTT CAC CAA CAC CAA TCC CCA »p» r-a* 
Leu Ser Thr Ser Gly Ser Pro Phe His Gin Gin §K Ser K £rj Gin 
35 40 45 

ler Pro £1 t? T ir I CC AGA AGT ACT GCA AGA GCA ACT AGT GAC AGA 192 
Ser Pro Asn He Phe Ser Arg Ser Thr Ala Arg Ala Thr Ser Asp Arg 

su 55 6o 

rZ cf G o CC CGC J* 0 ACA TAC CAA CCA TTG ^ T TT GAC AGT GAG GAC 240 
Thr Ser Pro Arg Lys Thr Tyr Gin Pro Leu Asn Phe Asp Ser Glu Asp 
OS 70 75 ~ 



288 



Met s2° r!~ l AT GAT ?* T ACC CCG AAC TTA CAA TTC AAC AAA AGC GGC 336 
Met Ser He Tyr Asp Asn Thr Pro Asn Leu Gin Phe Asn Lys Ser Gly 



384 



GC = a? £ CA ° CA AGA GCA CAA TTC ACA TCG AAA GAA TCT CCG AAA AGA 
Ala Ala Thr Pro Arg Ala Gin Phe Thr Ser Lys Glu Ser Pro Lys Arg 

120 125 3 

CAA AAA ACT ACT GAA GTG ACC ATT GAC TTT GAC AAT GAT GAT GAT AAC 432 
Gin Lys Thr Thr Glu Val Thr He Asp Phe Asp Asn Asp Asp Asp En 

135 140 

EI u A ° $2° TTA GAA TTT GAA ^T 66(5 TCA CCT CGT CGT TCA TTT CGT 
Asn His Thr Leu Glu Phe Glu Asn Gly Ser Pro Arg Arg Ser lie £J 

150 1S5 160 

AGT AGT GCT ATA AGC AGC GAA AGA TTT TTG CCT CCT CCA CAA CCA ATT 
Ser Ser Ala He Ser Ser Glu Arg Phe Leu Pro Pro Pro §E Pro He 
165 170 17 5 

TTC TCT CGA GAA ACA TTT GCT GAA GCC AAC TCC CGT GAA GAA GAA AAA «i7fi 
Phe Ser Arg Glu Thr Phe Ala Glu Ala Asn Ser Arg gE gE gE Es 
180 185 190 * 

ler A°a En rE rE GAT GAA *** TAC GAT TAT GAT TCA TAC 624 

Ser Ala Asp Gin Glu Thr Leu Asp Glu Lys Tyr Asp Tyr Asp Ser Tyr 
195 200 205 

Gin E« ^ T S AT GAG GAA GTA GAA ACA TTG CAT TCG GAA GGT ACA GCT 672 
Gin Lys Gly Tyr Glu Glu Val Glu Thr Leu His Ser Glu Gly Thr Ala 

215 220 



480 



528 



Tyr s"er civ l~ 1°* I™ F GA T CAT GCC AGT CCT GAA ACT ACA 720 

Tyr Ser Gly Ser Ser Tyr Leu Ser Asp Asp Ala Ser Pro Glu Thr Thr 
225 230 235 240 

GAT TAC TTT GGA GCT TCA ATT GAT GGT AAT ATT ATG CAC AAC ATT AAC 768 
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Asp Tyr Phe Gly Ala Ser lie Asp Gly Asn He Met His Asn He Asn 
245 250 255 

AAT GGA TAC GTA CCA AAT AG A GAA AAA ACC ATT ACC AAA AGA AAA GTG 816 
Asn Gly Tyr Val Pro Asn Arg Glu Lys Thr He Thr Lys Arg Lys Val 
260 265 270 

AGA TTA GTT GGT GGC AAA GCA GGT AAC TTG GTC TTG GAG AAT CCA GTT 864 
Arg Leu Val Gly Gly Lys Ala Gly Asn Leu Val Leu Glu Asn Pro Val 
275 280 285 

CCA ACA GAG TTG AGA AAA GTG TTG ACC AGA ACC GAG TCT CCA TTT GGT 912 
Pro Thr Glu Leu Arg Lys Val Leu Thr Arg Thr Glu Ser Pro Phe Gly 
290 295 300 

GAG TTT ACC AAC ATG ACA TAC ACA GCG TGC ACT TCG CAG CCA GAT ACT 960 
Glu Phe Thr Asn Met Thr Tyr Thr Ala Cys Thr Ser Gin Pro Asp Thr 
305 310 315 320 

TTT TCT GCT GAA GGG TTC ACC TTA AGA GCT GCC AAA TAC GGC AGA GAA 1008 
Phe Ser Ala Glu Gly Phe Thr Leu Arg Ala Ala Lys Tyr Gly Arg Glu 
325 330 335 

ACT GAG ATT GTC ATT TGT ATA ACC ATG TAT AAT GAG GAC GAA GTT GCA 1056 
Thr Glu He Val He Cys He Thr Met Tyr Asn Glu Asp Glu Val Ala 
340 345 350 

TTT GCC AGA ACT ATG CAT GGT GTG ATG AAA AAT ATC GCT CAT TTG TGC 1104 
Phe Ala Arg Thr Met HiB Gly Val Met Lys Asn He Ala His Leu Cys 
355 360 365 

TCA CGC CAT AAA TCC AAA ATA TGG GGC AAA GAT AGC TGG AAA AAA GTT 1152 
Ser Arg His Lys Ser Lys He Trp Gly Lys Asp Ser Trp Lys Lys Val 
370 375 380 

CAA GTG ATA ATT GTT GCA GAT GGT AGA AAT AAA GTT CAA CAA TCC GTT 1200 
Gin Val He He Val Ala Asp Gly Arg Asn Lys Val Gin Gin Ser Val 
385 390 395 400 

CTT GAA TTG CTT ACG GCA ACA GGC TGC TAT CAA GAA AAT TTG GCC AGG 1248 
Leu Glu Leu Leu Thr Ala Thr Gly Cys Tyr Gin Glu Asn Leu Ala Arg 
405 410 415 

CCC TAT GTC AAC AAT AGC AAA GTA AAT GCC CAT TTG TTT GAA TAT ACC 1296 
Pro Tyr Val Asn Asn Ser Lys Val Asn Ala His Leu Phe Glu Tyr Thr 
420 425 430 

ACT CAA ATA TCT ATC GAT GAG AAC TTG AAA TTC AAA GGA GAT GAA AAA 1344 
Thr Gin He Ser He Asp Glu Asn Leu Lys Phe Lys Gly Asp Glu Lys 
435 440 445 

AAC CTT GCA CCA GTT CAA GTC TTG TTC TGT TTG AAA GAA CTG AAC CAA 1392 
Asn Leu Ala Pro Val Gin Val Leu Phe Cys Leu Lys Glu Leu Asn Gin 
450 455 460 

AAG AAA ATC AAT TCC CAT AGA TGG CTT TTT AAT GCC TTT TGT CCT GTC 1440 
Lys Lys He Asn Ser His Arg Trp Leu Phe Asn Ala Phe Cys Pro Val 
465 470 475 480 

TTG GAC CCC AAT GTT ATT GTT CTT TTA GAT GTG GGT ACC AAA CCC GAT 1488 
Leu Asp Pro Asn Val He Val Leu Leu Asp Val Gly Thr Lys Pro Asp 
485 490 495 
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AAC CAT GCC ATT TAT AAT CTA TGG AAA GCA TTC GAT AGA GAT TCC AAT 1536 
Asn His Ala lie Tyr Asn Leu Trp Lys A!a Phe Asp Arg AsJ 2 2 1536 
3UU S05 510 



GTA GCA GGG GCT OCT GGT GAA ATT AAA GCG ATG AAA GGT AAA GGT TGG 
Val Ala Gly Ala Ala Gly Glu lie Lys Ala Met Lys gTJ J£ SJ £p 
318 520 525 * r 

ATT AAT CTT ACA AAT CCA TTA GTT GCG TCA CAG AAT TTT GAG TAT AAA 
lie Asn Leu Thr Asn Pro Leu Val Ala Ser Gin Asn Phe Su ™ Si 

535 540 7 

TTG TCC AAT ATT CTT GAT AAA CCG TTG GAA TCA CTT TTT GGA TAC ATT i fifin 
Leu Ser Asn lie Leu Asp Lys Pro Leu Glu Ser Leu Phe G? y ?Jr ! 168 ° 

550 555 



1584 



1632 



560 



TCT GTG TTA CCA GGT GCA TTG TCT GCA TAT CGA TAC ATT GCC TTr a a a 
Ser Val Leu Pro Gly A la Leu Ser Ala Tyr Sg JS Ma S 
565 570 



575 



AAC CAC GAT GAT GGT ACA GGG CCA TTG GCT TCT TAT TTC AAA GCT caa 
Asn His Asp Asp Gly Thr Gly Pro Leu Ala 2 lyl "e Sy ctS 



1728 



1776 



1824 



1872 



1920 



1968 



590 

GAT TTA CTC TGT TCA CAT GAC AAA GAC AAA GAG AAT ACC AAA GCT AAC 
Asp Leu Leu Cye Ser His Asp Lys Asp Lys Glu Asn Thr i£ aS £n 
033 600 605 

TTT TTC GAA GCA AAT ATG TAC TTG GCT GAA GAC AGA ATC CTT TGT TCr 
Phe Phe Glu Ala Asn Met Tyr Leu Ala Glu Asp i£ 2 2 £ys Sp 

615 620 

GAA TTG GTA TCA AAA AGA AAT GAC AAT TGG GTT CTT AAA TTT GTT AAA 
Glu Leu Val Ser Lys Arg Asn Asp Asn Trp Val Leu £s K Val Lys 

0 635 640 

fin a?* £k C ^ T GAA ACT GAT GTT CCT GAA * ca ATT CCA GAA TTT CTT 
Leu Ala Thr Gly Glu Thr Asp Val Pro Glu Thr He Ala Glu Phe 2 

645 650 655 

lit CGA AGA TGG ATT AAT GGT GCC TTT TTT GCT GCT TTG TAC 2016 

Ser Gin Arg Arg Arg Trp He Asn Gly Ala Phe Phe Ala All Leu Tyr 

660 665 6 7o 

TCC TTG TAT CAC TTT AGA AAA ATA TGG ACG ACT GAC CAT TCG TAT GCT 
Ser Leu Tyr His Phe Arg Lys lie Trp Thr Thr Asp lis Ser Tyr 22 
0,3 680 685 

AGA AAA TTT TGG CTA CAT GTC GAA GAA TTC ATT TAT CAA TTG GTA TCA 
Arg Lys Phe Trp Leu His Val Glu Glu Phe lie iyr 2 Si 2 

695 7Q0 

TTA TTG TTT TCA TTT TTT TCT TTG AGT AAT TTC TAT TTA ACA TTT TAT 
Leu Leu Phe Ser Phe Phe Ser Leu Ser Asn Phe gr 2 J2 Se gj 

710 715 720 

E 2 S SJ SSSSS £ gt - ---- - »« 

725 730 735 

SJ HI S iS ^ t TTA £° TAT CTC TGT ATC GG * GTT TTG 2256 

Gly Phe Trp lie Phe Thr Leu Phe Asn Tyr Leu Cys lie Gly Val Leu 

/4U 745 750 



2064 



2112 



2160 
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ACA TCT TTG TTC ATT GTC TCC ATT GGT AAT AGA CCA CAT GCA TCA AAG 2304 

Thr Ser Leu Phe lie Val Ser He Gly Asn Arg Pro His Ala Ser Lys 
755 760 765 

AAT ATT TTC AAA ACA TTA ATC ATA TTG TTA ACC ATA TGT GCA TTA TAC 2352 

Asn He Phe Lys Thr Leu He He Leu Leu Thr He Cys Ala Leu Tyr 
770 775 780 

GCA TTG GTG GTT GGA TTT GTG TTT GTT ATC AAT ACT ATT GCT ACT TTT 2400 

Ala Leu Val Val Gly Phe Val Phe Val He Asn Thr He Ala Thr Phe 

785 790 795 800 

GGA ACC GGT GGA ACA TCT ACC TAT GTG CTC GTT AGT ATT GTG GTT TCA 2448 

Gly Thr Gly Gly Thr Ser Thr Tyr Val Leu Val Ser lie Val Val Ser 
805 * 810 815 

TTG TTG TCC ACC TAT GGT CTT TAT ACG TTA ATG TCC ATT TTG TAC TTG 2496 

Leu Leu Ser Thr Tyr Gly Leu Tyr Thr Leu Met Ser He Leu Tyr Leu 
820 825 830 

GAC CCA TGG CAC ATG TTG ACT TGT TCT GTA CAA TAC TTT TTG ATG ATT 2544 

Asp Pro Trp His Met Leu Thr Cys Ser Val Gin Tyr Phe Leu Met He 
835 840 845 

CCA TCG TAC ACT TGT ACA TTA CAA ATA TTT GCA TTT TGT AAT ACT CAC 2592 

Pro Ser Tyr Thr Cys Thr Leu Gin lie Phe Ala Phe Cys Asn Thr His 
850 855 860 

GAT GTC TCG TGG GGT ACA AAA GGT GAC AAC AAT CCA AAA GAA GAT TTG 2640 

Asp Val Ser Trp Gly Thr Lys Gly Asp Asn Asn Pro Lys Glu Asp Leu 

865 870 875 880 

AGT AAT CAG TAC ATT ATT GAG AAA AAT GCC AGT GGA GAA TTT GAG GCT 2688 

Ser Asn Gin Tyr He lie Glu Lys Asn Ala Ser Gly Glu Phe Glu Ala 
885 890 895 

GTT ATT GTT GAT ACA AAT ATC GAT GAA GAT TAC CTT GAG ACA TTA TAT 2736 

Val lie Val Asp Thr Asn lie Asp Glu Asp Tyr Leu Glu Thr Leu Tyr 
900 905 910 

AAT ATC AGG TCA AAG AGA TCA AAC AAA AAA GTG GCT TTG GGC CAT TCT 2784 

Asn lie Arg Ser Lys Arg Ser Asn Lys Lys Val Ala Leu Gly His Ser 
915 920 925 

GAA AAG ACG CCT CTT GAT GGT GAT GAT TAT GCA AAA GAC GTT CGT ACT 2832 

Glu Lys Thr Pro Leu Asp Gly Asp Asp Tyr Ala Lys Asp Val Arg Thr 
930 935 940 

AGA GTT GTG TTG TTT TGG ATG ATT GCA AAT TTG GTA TTT ATA ATG ACC 2880 

Arg Val Val Leu Phe Trp Met lie Ala Asn Leu Val Phe lie Met Thr 

945 950 955 960 

ATG GTA CAA GTT TAC GAG CCA GGT GAT ACC GGA AGA AAC ATT TAT TTG 2928 

Met Val Gin Val Tyr Glu Pro Gly Asp Thr Gly Arg Asn lie Tyr Leu 
965 970 975 

GCC TTT ATT TTG TGG GCA GTG GCA GTG TTG GCT CTT GTC AGA GCT ATT 2976 

Ala Phe He Leu Trp Ala Val Ala Val Leu Ala Leu Val Arg Ala He 
980 985 990 

GGC TCT CTT GGA TAC TTG ATA CAA ACA TAT GCA CGG TTT TTT GTG GAA 3024 

Gly Ser Leu Gly Tyr Leu lie Gin Thr Tyr Ala Arg Phe Phe Val Glu 
995 1000 1005 
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™E = - 5 Kg s; s; s s s S E ffi S "» 



1020 



CCA TTA AAT TAG 

Pro Leu Asn 3084 
1025 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1027 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Lys Asn Pro Phe Asp Ser Gly Ser Asp Asp Glu Asp Pro Phe Leu 

Ser Asn Pro Gin Ser Ala Pro Ser Met Iro Tyr Ala Ala Tyr Phe Pro 

Leu Ser Thr Ser Gly Ser Pro Phe His Gin Gin Gin Ser Pro Arg Gin 

Ser Pro Asn lie Phe Ser Arg Ser Thr Ala Arg Ala il Ser Asp Arg 

Thr ser Pro Arg Lys Thr Tyr Gin Pro Leu Asn pSe Asp Ser Glu Asp 

Glu Asp Ala Lys Glu Ser Glu Phe Met Ala Ma Thr Ser Lys Leu Ssn 

90 o c 

Met Ser lie Tyr Asp Asn Thr Pro Asn Leu Gin Phe Asn Lys Ser Gly 

Ala Ala Thr Pro Arg Ala Gin Phe HI Ser Lys Glu Ser III Lys Arg 

Gin L ys Thr Thr Glu Val Thr He Asp Phe Asp Asn As" Asp Asp Asn 

Asn His Thr Leu Glu Phe Glu Asn Gly Ser Pro H° 9 Arg Ser Phe Arg 

Ser Ser Ala He Ser Ser Glu Arg Phe Leu Pro Pro Pro Gin Pro He 

Phe ser Arg Glu Thr Phe Ala Glu Ala Asn Ser Arg Glu Glu Gil Lys 

Ser Ala Asp Gin Glu Thr Leu Asp "u Lys Tyr Asp Tyr lsp Ser Tyr 

Gin Lys Gly Tyr Glu Glu Val Glu Thr Leu His Ser III Gly Thr Ala 

_ 220 

Tyr Ser Gly Ser Ser Tyr Leu Ser Asp Asp Ala Ser Pro Glu Thr Thr 

Asp Tyr Phe Gly Ala-Ser lie Asp Gly Asn III Met His Asn He Asn 

250 « cc 
Asn Gly Tyr Val Pro Asn Arg Glu Lys Thr He Thr Lys Arg l" Val 

Arg Leu Val Gly Gly Lys Ala Gly Asn Leu Val Leu Glu 11° Pro Val 



280 20c 
Pro Thr Glu Leu Arg Lys Val Leu Thr Arg Thr Glu Ser Pro Phe Gly 

Glu Phe Thr Asn Met Thr Tyr Thr Ala Cys Thr Ser Gin Pro Asp Thr 

Phe Ser Ala Glu Gly Phe Thr Leu Arg Ala III Lys Tyr Gly Arg III 
J " 330 335 
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Thr Glu lie Val He Cys He Thr Met Tyr Asn Glu Asp Glu Val Ala 

Phe Ala Arg Thr Met His Gly Val Met Lys Asn He Ala His Leu Cys 

355 360 365 

Ser Arg His Lys Ser Lys lie Trp Gly Lys Asp Ser Trp Lys Lys Val 

370 375 380 

Gin Val lie He Val Ala Asp Gly Arg Asn Lys Val Gin Gin Ser Val 
385 390 395 * 

Leu Glu Leu Leu Thr Ala Thr Gly Cys Tyr Gin Glu Aan Leu Ala Arg 

Pro Tyr Val Asn Asn Ser Lys Val Asn Ala His Leu Phe Glu Tyr Thr 

420 425 430 

Thr Gin He Ser He Asp Glu Asn Leu Lys Phe Lys Gly Asp Glu Lys 

435 440 445 

Asn Leu Ala Pro Val Gin Val Leu Phe Cys Leu Lys Glu Leu Asn Gin 

450 455 460 

Lys Lys lie Asn Ser His Arg Trp Leu Phe Asn Ala Phe Cys Pro Val 
465 470 475 480 

Leu Asp Pro Asn Val lie Val Leu Leu Asp Val Gly Thr Lys Pro Asp 

485 490 495 

Asn His Ala He Tyr Asn Leu Trp Lys Ala Phe Asp Arg Asp Ser Asn 

500 505 510 

Val Ala Gly Ala Ala Gly Glu lie Lys Ala Met Lys Gly Lys Gly Trp 

515 520 525 

He Asn Leu Thr Asn Pro Leu Val Ala Ser Gin Asn Phe Glu Tyr Lys 

530 535 540 

Leu Ser Asn He Leu Asp Lys Pro Leu Glu Ser Leu Phe Gly Tyr lie 
545 550 555 560 

Ser Val Leu Pro Gly Ala Leu Ser Ala Tyr Arg Tyr He Ala Leu Lys 

565 570 575 

Asn His Asp Asp Gly Thr Gly Pro Leu Ala Ser Tyr Phe Lys Gly Glu 

580 585 590 

Asp Leu Leu Cys Ser His Asp Lys Asp Lys Glu Asn Thr Lys Ala Asn 

595 600 605 

Phe Phe Glu Ala Asn Met Tyr Leu Ala Glu Asp Arg He Leu Cys Trp 

610 615 620 

Glu Leu Val Ser Lys Arg Asn Asp Asn Trp Val Leu Lys Phe Val Lys 
625 630 635 640 

Leu Ala Thr Gly Glu Thr Asp Val Pro Glu Thr He Ala Glu Phe Leu 

645 650 655 

Ser Gin Arg Arg Arg Trp He Asn Gly Ala Phe Phe Ala Ala Leu Tyr 

660 665 670 

Ser Leu Tyr His Phe Arg Lys He Trp Thr Thr Asp His Ser Tyr Ala 

675 680 685 

Arg Lys Phe Trp Leu His Val Glu Glu Phe He Tyr Gin Leu Val Ser 

690 695 700 

Leu Leu Phe Ser Phe Phe Ser Leu Ser Asn Phe Tyr Leu Thr Phe Tyr 
705 710 715 

Phe Leu Thr Gly Ser Leu Val Ser Tyr Lys Ser Leu Gly Lys Lys Gly 

725 730 i" 

Gly Phe Trp He Phe Thr Leu Phe Asn Tyr Leu Cys He Gly Val Leu 

740 745 750 

Thr Ser Leu Phe He Val Ser He Gly Asn Arg Pro His Ala Ser Lys 

755 760 765 

Asn He Phe Lys Thr Leu He He Leu Leu Thr He Cys Ala Leu Tyr 

770 775 780 

Ala Leu Val Val Gly Phe Val Phe Val He Asn Thr He Ala Thr Phe 
785 790 795 BUU 

Gly Thr Gly Gly Thr Ser Thr Tyr Val Leu Val Ser He Val Val Ser 

805 81° 815 , 

Leu Leu Ser Thr Tyr Gly Leu Tyr Thr Leu Met Ser He Leu Tyr Leu 
820 825 830 
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Asp Pro Trp Hie Met Leu Thr Cys Ser Val Gin Tyr Phe Leu Met lie 

835 840 845 

Pro ser Tyr Thr Cys Thr Leu Gin lie Phe Ala Phe Cys Asn Thr His 

850 855 860 

Asp Val Ser Trp Gly Thr Lys Gly Asp Asn Asn Pro Lys Glu Asp Leu 
865 870 875 asn 

Ser Asn Gin Tyr lie lie Glu Lys Asn Ala Ser Gly Glu Phe Glu Ala 

BB 5 890 895 

Val lie Val Asp Thr Asn lie Asp Glu Asp Tyr Leu Glu Thr Leu Tyr 

900 905 9io 

Asn lie Arg Ser Lys Arg Ser Asn Lys Lys Val Ala Leu Gly His Ser 

915 920 925 

Glu Lys Thr Pro Leu Asp Gly Asp Asp Tyr Ala Lys Asp Val Arg Thr 

9 ->0 935 940 

Arg Val Val Leu Phe Trp Met lie Ala Asn Leu Val Phe He Met Thr 

950 955 960 

Met val Gin Val Tyr Glu Pro Gly Asp Thr Gly Arg Asn lie Tyr Leu 

965 970 975 

Ala Phe He Leu Trp Ala Val Ala Val Leu Ala Leu Val Arg Ala lie 

980 985 990 

Gly Ser Leu Gly Tyr Leu He Gin Thr Tyr Ala Arg Phe Phe Val Glu 
"5 1000 1005 

loTo TrP Me \o? S Arg Gly T * r Thr Ala Pro Se r His Asn 

1U1U 1015 1020 

Pro Leu Asn 
025 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3084 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

T Af£™™ G 2^ CTCTC ACCG TCACTG CTACTTCTAG GTAAAGAATC ATTAGGTGTT 60 

AGACGTGGTA GTTACGGGAT GCGTCGTATA TGATCACTGT CTTGTAGCGG GGCGTTCTGT ! t n 

ATGGTTGGTA ACTTAAAACT GYCACTCCTG CTTCTACGAT TTCTTTCGCT TAAATACCGA 180 

™°? GACA GCTGATCACC TAGAGGTAAA GTGGTTGTCG TTAGGGGTTC TGTTAgISS lie 

^ A ^S A S AAA GGTCTTCATG ACGTTCTCGT CGTTGGAGTT TCGACTTATA CTCGTATATA 300 

£2™°,°° CCTTGAATGT TAAGTTGTTT TCGCCGCGTC GGTGTGGTTC TCGTGTTaS HZ 

SI??S^ ^ AGAGGCTT ^^TTTTT TGATGACTTC ACTGGTAACT SSSJm 420 

S™ AC J AT TGTTAGTGTG CAATCTTAAA CTTTTACCCA GTGGAGCAGC AAGTAAACCA 480 

tSXISE TTCTAAAAAC GGAGGAGGTG TTGGTTAAAA JESSES Ho 

1%%£^~ C TTCGGTTGAG GSCACTTCTT CTTTTTAGCC GTCTAGTTCT TTGTAATCTA 600 

^™» TGC TAAT ACTAAG TATGGTCTTC CCAATACTCC TTCATCTTTG TAACGTAAGC 660 

G ^I AGAATA AACAGCCTAC TACGGTCAGG %£££££ 120 
CTAATGAAAC CTCGAAGTTA ACTACCATTA TAATACGTGT TGTAATTGTT ACCTATGCAT nan 

gJS ATGG "TTCT TTTCACTCTA JSSXSSZ E£E£g£ III 

llt^rlitn ™™ GG TCAAGG TTGT CTCAACTCTT TTCACAACTG GTCTTGGCTC 900 

aa**™™ CACTCAAATG GTTGTACTGT ATGTGTCGCA CGTGAAGCGT CGGTCTATGA 960 

TAAACATATt riS?™™ GAATTCTCGA CGGTTTATGC CGTCTCTTTG ACTCTAACAG llto 

GGTACATATT ACTCCTGCTT CAACGTAAAC GGTCTTGATA CGTACCACAC 1080 

AG S GAGTAAA CACGAG T G CG GTATTTAGCT TTTATACCCC GTTTCTATCG 1140 

JaSSSg aISSSSS " AAC ^ acgt ctaccatctt TATTTCAAGT TGTTAGGCAA llio 

TTATCfiTTTr JSJSfSSHS GTTCTTTTAA ACCGGTCCGG GATACAGTTG 1260 

Hctctaa2S SSSSSSJ ^^TT ATATGGTGAG TTTATAGATA GCTACTCTTG 1320 

C^lU^rJ II™^ 1 T "TTTGGAA CGTGGTCAAG TTCAGAACAA GACAAACTTT 1380 

CTTGACTTGG TTTTCTTTTA GTTAAGGGTA TCTACCGAAA AATTACGGAA AACAGGACAG 1440 
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AACCTGGGGT 
ATATTAGATA 
TTTCGCTACT 
AAACTCATAT 
AGACACAATG 
CCATGTCCCG 
CTGTTTCTCT 
TAGGAAACAA 
GACCGTTGGC 
TCTACCTAAT 
ACCTGCTGAC 
GTTAACCATA 
AAAAACTGTC 
AAGTGTAATA 
CCATTATCTG 
ACACGTAATA 
CCTTGGCCAC 
ATACCAGAAA 
AGACATGTTA 
ACATTATGAG 
TCATTAGTCA 
TGTTTATAGC 
TTTTTTCACC 
CTGCAAGCAT 
TACCATGTTC 
ACCCGTCACC 
TGTAJACGTG 
GGCTCAGTGT 



TACAATAACA 
CCTTTCGTAA 
TTCCATTTCC 
TTAACAGGTT 
GTCCACGTAA 
GTAACCGAAG 
TATGGTTTCG 
CCCTTAACCA 
CACTTTGACT 
TACCACGGAA 
TGGTAAGCAT 
GTAATAACAA 
CAAGTAACCA 
AGTTAATAGA 
GTGTACGTAG 
TGCGTAACCA 
CTTGTAGATG 
TATGCAATTA 
TGAAAAACTA 
TGCTACAGAG 
TGTAATAACT 
TACTTCTAAT 
GAAACCCGGT 
GATCTCAACA 
AAATGCTCGG 
GTCACAACCG 
CCAAAAAACA 
TAGGTAATTT 



AGAAAATCTA 
GCTATCTCTA 
AACCTAATTA 
ATAAGAACTA 
CAGACGTATA 
AATAAAGTTT 
ATTGAAAAAG 
TAGTTTTTCT 
ACAAGGACTT 
AAAACGACGA 
ACGATCTTTT 
AAGTAAAAAA 
CAGAATGTTT 
GACATAGCCA 
TTTCTTATAA 
CCAACCTAAA 
GATACACGAG 
CAGGTAAAAC 
CTAAGGTAGC 
CACCCCATGT 
CTTTTTACGG 
GGAACTCTGT 
AAGACTTTTC 
CAACAAAACC 
TCCACTATGG 
AGAACAGTCT 
CCTTAGCTTC 
AATC 



CACCCATGGT 
AGGTTACATC 
GAATGTTTAG 
TTTGGCAACC 
GCTATGTAAC 
CCACTTCTAA 
CTTCGTTTAT 
TTACTGTTAA 
TGTTAACGTC 
AACATGAGGA 
AAAACCGATG 
AGAAACTCAT 
TCAGAACCAT 
CAAAACTGTA 
AAGTTTTGTA 
CACAAACAAT 
CAATCATAAC 
ATGAACCTGG 
ATGTGAACAT 
TTTCCACTGT 
TCACCTCTTA 
AATATATTAT 
TGCGGAGAAC 
TACTAACGTT 
CCTTCTTTGT 
CGATAACCGA 
TCATTTACCT 



TTGGGCTATT 
GTCCCCGACG 
GTAATCAACG 
TTAGTGAAAA 
GGAACTTTTT 
ATGAGACAAG 
ACATGAACCG 
CCCAAGAATT 
TTAAAGAAAG 
ACATAGTGAA 
TACAGCTTCT 
TAAAGATAAA 
TTTTTCCACC 
GAAACAAGTA 
ATTAGTATAA 
AGTTATGATA 
ACCAAAGTAA 
GTACCGTGTA 
GTAATGTTTA 
TGTTAGGTTT 
AACTCCGACA 
AGTCCAGTTT 
TACCACTACT 
TAAACCATAA 
AAATAAACCG 
GAGAACCTAT 
ACTTTGCTCC 



GGTACGGTAA 
ACCACTTTAA 
CAGTGTCTTA 
ACCTATGTAA 
GGTGCTACTA 
TGTACTGTTT 
ACTTCTGTCT 
TAAACAATTT 
CGTTTCTGCT 
ATCTTTTTAT 
TAAGTAAATA 
TTGTAAAATA 
TAAAACCTAA 
ACAGAGGTAA 
CAATTGGTAT 
ACGATGAAAA 
CAACAGGTGG 
CAACTGAACA 
TAAACGTAAA 
TCTTCTAAAC 
ATAACAACTA 
CTCTAGTTTG 
AATACGTTTT 
ATATTACTGG 
GAAATAAAAC 
GAACTATGTT 
TATATGGCGC 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1734 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4i 



ATAATCGTTG 
CTGTCATGTC 
TTGTGGCTAG 
CAGCTTATTG 
ATAAYYGAAA 
AAAACTAATC 
AAANGATATT 
AGTTATTTTT 
TTCTTTCAAT 
ATATTTATTG 
TGAAGTTATA 
AGGTCTGCTA 
TAAATCATCC 
CGTCATAATT 
GTGTGGGGGA 
CCACGTAATC 
ATTTATACCC 
TTTGAAATAC 
ACTTGGTGGA 
GGTGATGGAA 
GGAATTGATA 
GATATTAGTG 



TGCTACTGGT 
GATCAAGTTA 
TTTTTTCGAT 
TAGGTGCTCC 
TTTAAAGCCA 
ACAAAGACTA 
CCGCTTTTCA 
GGGGTACTAC 
TGATAAACCT 
TAGATTACAC 
TGAATCGATG 
TCGTTTATAC 
CATTGTTTCA 
ATCGGAATAA 
GGGGGAGGGA 
AACAACAAGT 
CCCCACTCCC 
AAATCTCTTT 
AAGACGCTAC 
TTGGTGATAT 
TTATTTGGTT 
ATTATGAATC 



AGCTAGTTTC 
CTTACAGGTA 
GTTTTACAAA 
TTTCATTATT 
ACTAGCCAAC 
AAAGAAAGTG 
AAAAAACATT 
TATGCATGTG 
ACCAAAACAT 
CCCGCTCTAC 
TTAAAAATCT 
TATGACCGCA 
AGTTTCTTTG 
TTTAAGCGAG 
AACAAGGAAG 
AGCCATATAA 
CCAACCTTCC 
TAAAACCAAC 
TATTTATCAA 
TCCAGGGATA 
AAGTCCAATG 
TATAAATCCT 



TGCTCTCTCA 
AATTATTGAG 
ATGAAAAAAA 
CGTACTTCCT 
TAGCCAACTA 
TAGTTATAAA 
ATTGCGAAAA 
TTGTTGTCAA 
CTGGTAATCA 
AAAGTTACCA 
GCGTCTCGTG 
TCATATACAG 
TTTAGCAAAG 
GAAAAGTTGT 
TATACCTCCA 
TTCAAAATTT 
AATTTTCCTC 
TTAAACCTAT 
ATTTGGCCTG 
ATTTCTACAT 
TATAAATCCC 
GATTTTGGTA 



CTATANGGTC 
TTTCAATAAG 
ACTTAATACA 
ACCCCATGGA 
GCCAGCTAGC 
TCATTGCGAG 
TCATTGCNGA 
TGTCTACCAC 
AAAGCTACTT 
TGAAGACAAA 
GAGAGTAACT 
GACATTAGAG 
AGACAGTTCC 
GAAACAAATT 
CCAAGTAGAA 
GTAGTAGTTG 
TTCCTCTGGG 
TAATTATGAC 
CTTCATATAA 
TAGATTATCT 
CTATGGAAGA 
CTATGGAAGA 



TTAGTGTTGA 
GTTGGTTTCG 
TTTAAGCCAA 
GTTTAAAATG 
MAGMCAAGAC 
AATTATTGCG 
NGAAAGGGGG 
AAAAAGGGGC 
GTGTGAGACT 
ACAACTTGTT 
TGATTATGTT 
CATCCTAAAT 
AACTTGTTGT 
GAAGAGTGGA 
CCCAAATACT 
GGCAAATAAT 
AATTTTTTTT 
AATTGAATAT 
AGATTCCAAT 
TAAAAATTTA 
TATGGGTTAT 
CATGCAAAAT 



1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3084 



60 
120 
180 
240 
300 
360 
420 
460 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
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1380 
1440 
1500 
1560 
1620 
1680 
1734 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GGAGGAGTCG ACATGACAGT CAACAC 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CGCATTAAAG CTCTAGAAGA ACCACC 



26 
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What is claimed is: 

1. Substantially pure chitin synthase (CHS1) 
polypeptide. 

2. The CHS1 of claim 1, characterized in that it 
5 has a molecular weight of about 116kD as determined by 

reducing SDS-PAGE. 

3. The CHS1 of claim 1, having the amino acid 
sequence of SEQ ID NO:2 (Figure lb-g) - 

4. An isolated polynucleotide encoding the CHS1 
10 polypeptide of claim 1. 

5. The polynucleotide of claim 4, having the 
sequence of SEQ ID N0:1 (Figure lb-g). 

6. The CHS1 of claim 1, wherein the CHS1 is 
derived from a yeast cell. 

15 7. An expression vector comprising the 

polynucleotide of claim 4. 

8. A host cell comprising the vector of claim 7. 

9. An antibody that binds specifically to the CHS1 
polypeptide of claim 1. 

20 10. A method for inhibiting the growth of yeast 

comprising contacting the yeast with an inhibiting 
effective amount of a reagent which suppresses CHS1 
activity. 
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11. The method of claim 10, wherein the reagent is 
a CHSl antisense sequence. 

12. The method of claim 10, wherein the yeast is 
Candida albicans. 

13. The method of claim 10, wherein the reagent is 
an anti-CHSl antibody. 



14. A method for determining whether a compound 
affects CHSl activity, said method comprising: 

a) incubating the compound with CHSl polypeptide, 

10 or with a recombinant cell expressing CHSl under' 

conditions sufficient to allow the components to 
interact; and 

b) determining the effect of the compound on CHSl 
activity or expression. 



15 



15. The method of claim 14, wherein the effect is 
inhibition of CHSl activity 



16. A vector for identifying a eukaryotic 
regulatory polynucleotide which is capable of regulating 
gene expression in a prokaryotic host cell, comprising: 
20 a) a selectable marker gene; 

b) at the 5' terminus of the marker gene, a 
restriction site at which a eukaryotic regulatory 
polynucleotide can be inserted to regulate expression of 
said marker gene; and 

25 c > a Polynucleotide which facilitates 

integration of the vector into the genome of said 
prokaryotic cell. 



17. The vector of claim 16, wherein the marker 
gene is an auxotrophic gene. 
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18. The vector of claim 17, wherein the 
auxotrophic gene is URA3 . 

19- A host cell E. coli comprising the vector of 
claim 16. 

5 20. A method for identifying a eukaryotic 

regulatory polynucleotide, said method comprising 

a) providing a vector comprising 

(i) a selectable marker gene; 

(ii) at the 5' terminus of the marker gene, 
10 a restriction endonuclease site at which a eukaryotic 

regulatory polynucleotide can be inserted to regulate 
expression of said marker gene; and 

(iii) a polynucleotide which facilitates in- 
tegration of the vector into the genome of a 

15 predetermined cell; 

b) inserting genomic DNA of a eukaryotic organism 
into said vector at said restriction site; 

c) inserting the resultant eukaryotic 
polynucleotide-containing vector into a host cell; 

20 d) detecting the selectable marker as an 

indication that the inserted eukaryotic polynucleotide is 
a regulatory polynucleotide. 

21. The method of claim 20, wherein the eukaryote 
is a fungal pathogen. 

25 22. The method of claim 21, wherein the fungal 

pathogen is selected from the group consisting of Candida 
albicans, Rhodotorula sp., Saccharomyces cerevisiae, 
Blastoschizomyces capitatus, Histoplasma capsulatum, 
Aspergillus fumigatus, Coccidioides immitis, 

30 Paracoccidioides brasiliensis, Blastomyces dermatitidis , 
and Cryptococcus neoformans. 
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23. The method of claim 20, wherein the marker 
gene is an auxotrophic gene. 

24. The method of claim 23, wherein the 
auxotrophic gene is URA3. 

5 25. The method of claim 20, wherein the 

predetermined cell is eukaryotic. 

26. The method of claim 20, wherein the 
predetermined cell is prokaryotic. 

27. A library of host cells, wherein each host 
10 cell contains a vector according to claim 16. 

28. An isolated regulatory polynucleotide 
characterized in that it is induced by maltose and 
repressed by glucose. 

29. The polynucleotide of claim 28 having the 
15 sequence of SEQ ID NO: 4 (Figure 3a-b) . 

30. The polynucleotide of claim 28, wherein the 
polynucleotide is derived from a yeast cell. 

31. A method for determining whether a 
polynucleotide encodes a growth-associated polypeptide, 
20 said method comprising: 

a) incubating a cell comprising the 
polynucleotide operably linked with the regulatory 
polynucleotide of claim 28, under conditions which 
repress the regulatory polynucleotide; and 



25 



b) determining the effect on the growth of 
the cell. 
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32. The method of claim 31 , wherein the effect i 
inhibition of cell growth. 
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BsaBl 

Mam l Mam I Mbo II 

Mbo II ! Alw I i Xho ll I Bsg I 

(SEQ ID NO:) 

ATCAACAATCCATTTCACACTCGCAGTCAC CATCAACArCCATTTCTTAGTAATCCACAATCTCCACCATCAATCCCCTACCCAGCATAT 
l TACTTCTTAGGTAAACTCTCACCCTCACTGCTACTTCTAGCTAAACAATCATTAGGTCTTAGACCTCGTAGrTACGCGATGCGTCGTATA 
nKNPrOSCSOOEOPFLSNPQSAPSMPYAAY 



90 



Spel 
Bbvl 



Acs I 
Apo I 



SfaN I 



Acs I 
Mbo II Apo I 



ACTAGTGACAGAACATCCCCCCGCAACACATACCAACCATTGAATTTT CACAGTGAGGACGAAGATGCTAAAGAAAGCGAATTTATGCCT 

l i < ' ' ' ' ' 1 1 

TCATCACTCTCTTGTACCGCCCCCTTCTGTATCCTTGCTAACTTAAAACTGTCACTCCTCCTTCTACCATTTCTTTCGCTTAAATACCCA 

TSORTSPRKTYQPLNFOSC0e0AK£SEFnA 



180 



Sal I 
j Accl 
j i HinO It 
\ j Hinc II 
H Spe I 



Xho H 

HphI Alw I 



Hph I Ssp I Sea I 

TTCCCACTGTCGACTAGTGGATCTCCATTTCACCAACAGCAATCCCCAAGACAATCACCTAATATTTTTTXCAGAAGTACTCCAACACCA 
AAGGGTCACACCTGATCACCTAGAGCTAAAGTCGTTGTCGTTACGCCTTCTGTTACTCCATTATAAAAAACGTCTTCATGACCTTCTCCT 
FPL5TSG5PFH0Q0SPRGSPNirSRSTARA 



270 



Bbvl 



Alw21 I 
AspH I 
Bsi HKA ! 
Bbvl HgiAi 



GCAACCTCAAAGC TGAAT ATGAGCATATATGA7 AA TACCCCGAACTTACAA TTC AACAAAAGCCGCGCAGCCACACCAAGAGCACAATTC 
■ I i i ■ t I ' 1 ■ ■ | ■ | 

CGTTCGAGrTTCCACTTATACTCGTATATACTAT TATCGGGCTTGAATGTT AAGTTGTT TTCGCCGCGTCGCTG TGGTTCTCCTGTTAAG 

atsklniisitontpnlQFnksgaatpraqf 



360 



BsaB l 
Mam I Mam I 



Acs I 
Apo l 



Eco57 I 

acatccaaagaatctccgaaaagacaaaaaactact gaagtgaccattgactttgacaatgatgatgataacaatcacaccttagaattt 

TGTACCTTTCT7AGAGCCTTTTCTGTTTTTTGATCACTTCACTGGTAACTGAAACTCTTACTACTACTATTGTTAGTCTCGAATCTTAAA 
TSKESPKROKTTEVT IOFONDOONNHTlEF 



450 



Aval 

HphI ; BslEH Bbvl : Xhol 

GAAAATCGGTCACCTCCTCGrTCATTTCGTACTAGTCCTATAACCAGCCAAAGATTTTTCCCTCCTCCACAACCAATTiTCTCTCCACAA 
CrTTTACCCAGTCCACCAGCAACTAAAGCATCATCACCATATTCGTCGCT77CTAAAAACGCACGACCrGT7CGTTAAAAGAGACCTCTT 
ENCSPRRSFRSSAISSERFLPPPOP IFSRE 



540 



FIG. 1B 
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Mbo II 
Eco57 I j Mbo II 

ACATTTCCTCAACCCAACTCCCCTCAACAACAAAAATCCGCAGATCAACAAACATTAGATGAAAAA7ACCATTATCATTCATACCAGAAC 

1 > ■ i i i ■ ' " ' ■ > 630 

TGTAAACCACTTCGGrTGACGGCACTTCTTCTTTTTACCCGTCTAGTTCTTTGTAArCTACTTTTTATCCTAATACTAACTATGGTCTTC 

TFAE!ANSRC£EKSAOQ£TLOEKYDtOSTOK 
BsrOI 

BsaM I BseN I 

Bsml SfaNI Bsrl Fok I Sic I 

CGTTATGACGAAGTAGAAACATTGCATTCGGAAGGTACAGCTTATAGTGGCTCATCTTATTTGTCGGATGATGCCACTCCTGAAACTACA 
. , , 1 1 1 ' 1 ► 720 

ccaatactccttcatctttgtaacc;aagccttccatgtccaatatcaccgactagaataaacagcctac7acggtcagcactttcatct 

GtEEVlTLHSECTAYSGSSYLSOOASPETT 

BsaAl 
BsaAl 

Mun I Ssp I SnaB I 

GATTACTTTGGAGCTTCAAT TGATGGT AAT AT TATGCACAACATTAACAATGGATACGTACCAAATAGAGAAAAAACC ATTACCAAAAGA 

, , , , 1 1 , 1 ■ i 8)0 

CTAATCAAACCTCGAAGTTAACTACCATTATAATACGTCTTGTAATTCTTACCTATGCATGCTTTATCTCrTTTTTGGTAATGGTTTTCT 

OYFGASIOGNIMHNINNCYVPNREKTITKR 

BseN I HinO II 

BspM I Bsr I Hinc II 

AAACTGAGATTACTTCGTGCCAAACCAGCTAACTTGGTCTTCCAGAATCCACTTCCAACAGAGTTGAGAAAACTCTTGACCACAACCGAG 

i 1 1 1 ■ I i » > 1 ■ i 900 

TTTCACTCTAATCAACCACCCTTTCCTCCATTGAACCACAACCTCTTAGCTCAACGTTGTCTCAACTCTTTTCACAACTGCTCTTGCCTC 

KVRLVCCKACN.LVL£NPVPT£L«KVLTRTE 

Ora 111 
i ApaL I 

Alw21 I 

Alw26 I AspH I 

BsmA I Bsi HKA I 

j fie I Hph I HgiA I Bbv i Hph I Bbv I Afl II Eco57 1 

TCTCCATTTCGTCAGTTTACCAACATCACATACACACCCTGCACTTCCCACCCACATACTTTTTCTGC7GAACGGTTCACCTTAACACCT 

—i 1 1 1 1 ■ i i ► 990 

ACACCTAAACCACTCAAATCGTTCTACTGTATGTGTCGCACCTCAACCCTCGGTCTATCAAAAACACCACTTCCCAAGTGCAATTCTCGA 

SP F C EFTWH7YTACTS0PQTFSAECFTLRA 

PpulO I 
Nsil 

; : 

GCCAAATACCGCAGAGAAACTGAGATTGTCATTTCTATAACCATGTATAATGAGGACGAAGTTGCATT7GCCACAACTATCCATCGTG7G 

j~ , , 1 i • ' > ' l r 1080 

CCCTTTATCCCGTCTCTTTCACTCTAACACTAAACATATTGGTACATATTACTCCTGCTTCAACGTAAACGGTCTTGATACGTACCACAC 

AKYCRETE I V ICI TMYNEOEVAFARTHHGV 



FIG. 1C 
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AIw21 1 
AspH I 
Bsi HKA I 
HgiA I 
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ACCB7 I 
PfiM I 
RiMI 
Van9l I 



ATCAAAAATATCGCTCATTTCTCCTCACCCCATAAATCCAAA ATATCCCGCAAAGATAGCTGCAAAAAAGTTCAAGTCATAATTGTTGCA 
TACTTTTTATAGCCAGTAAACACCAGTCCCGTATTTAGGTTTTATACCCCGTTTCTATCGACCTTTTrTCAAGTTCACTATTAACAACGr 
flKNlAHLCSRHKSKtwGKOSWKKVQV! [VA 



I 170 



Cirl 
£ael 
; Bal I 
! MluNl 
i Mscl 
; Mscl 



8bv l 



ACS I 
ApO I 



Orall 

Ecc-0109 1 
EcoO109 I 



HinO II 
Nine II 



CATGGTACAAATAAACTTCAACAATCCCTTCTTCAATTGCT7ACCCCAACACCCTCC7ATCAACAAAATTTCCCCACGCCCTATCTCAAC 
CTACCATCTTTATTTCAAGTTCTTAGCCAACAACTTAACCAATCCCCTTCTCCGACGATAGTTCTTTTAAACCCGTCCGGGATACACTTC 
OCRNKVOOS'VLELLTATCCYOCNLARPYVN 



1260 



Clal 



Acs l 
Apol 



AATAGCAAACTAAATCCCCATTTCTTTGAATATACCACTCAAATATCTATCGATGACAACTTGAAATTCAAAGCACATGAAAAAAACCTT 
, ; t | ■ I ■ I ■ I I ■ I 

TTATCGTTTCATTTACCGGTAAACAAACTTATATGGTGACTTTATAGATAGCTACTCTTGAACTTTAAGTTTCCTCTACTTTTTTTCGAA 
NSKVNAHLFEYTTQ IS IOENLKFKGOEKNL 



1350 



BseN I 
Bsr I 

GCACCAGT7CAAGTCT7G7TC?GT77GAAAGAACTGAACCAAAACA AAATCAATTCCCA7AGATGGCTTTTTAATGCC7TTTGTCCT"GTC 
CG7CCTCAAGT7CACAACAACACAAAC77 7C77CACT7GC7777C777 7AG77AAGGG7A7CTACCCAAAAA77ACGGAAAACAGGACAC 
APVOVLFCIKELNOKK INSHRWLFNAFCP V 



ACC65 I 
Asp 718 
Ban I 
Beg r 
HgiC I 
Kpnl 



Beg I 



Asp 700 
BsaMI 
Bsml 
Xmnl 



77CGACCCCAA7C77A77CT7C7T77AGA7G7CGG7ACCAAACCCGA7AACCA7CCCA777ATAA7C7A7GCAAACCA7TCGA7AGACAT 

i , , ■ , > 1 , i i 1530 

AACCTCCCC77ACAArAACAAGAAAA7CTACACCCATCC777CGGC7A77CG7ACCC7AAATA7TAGA7ACC777CC7AAGC7A7C7C7A 

LOPNV I VLLOVCTKPONHA I YNLWKAFORO 



Acs I 
Apol 



Bbvl Alwnl Hphl Vspl ^Hga I 

7CCAA7C7ACCACCCCC7CC7CG7GAAAT7AAACCGA7GAAAGG 7AAACG77GGA77AA7C77ACAAA7CCA77AC7TGCG7CACACAAT 
ACC77ACA7CC7CCCCCACCACCAC777AA777CCC7AC777CCA777CCAAGCTAAT7ACAATC777ACC7AATCAACGCAG7C7C77A 
SNVACAAGE I KAMKGKGWINLTNPLVASO.N 



1620 



FIG. 1D 
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Sspl Msii SexAl cial 

tttcagtataaattctccaatattcttcataaaccgttcgaatcactttttcgatacatttctgtcttaccaggtccattgtctgcatat' 



AAACTCATATTTAACACCTTATAAGAACTATTTGCCAACCTTACTCAAAAACCTATGTAAAGACACAATGGTCCACGTAACACACCTATA 
FEYICLSNILOKPLESLFGTISVLPGALSAY 



"+ 1710 



MSII 

; BsiXI 



Hphl 
Mbo If 



BsrD I 

CGATACAT TGCCTTGAAAAACC ACGA TGATGGT ACAGGGCC ATTGGC 7 7C TTATTTC AAAGGTGAAGATTT ACTC TGTTCACATGACAAA 
CC7ATC7AACCGAAC7TT77GC7GCTAC7ACCA7G7CCCGGTAACCCAACAATAAAG777CCACT7C7AAATGAGACAAG7G7AC7C7T7 
R Y I ALKNH00C7GPLA5TrKG£DLLCSH0K 



1800 



ASU II 
Csp45 I 
Nsp V 
Sful 



Bbs I 
Bsc91l 



Mbo II Eco57 I 

GACAAAGAGAArACCAAAGC 7AAC7 7 7T7CGAAGCAAA7A7G7AC7TGGC7GAAGACAGAA7CC7TTG77GGGAA7TGG7A7CAAAAAGA 
C7CT77C7CT7ATCG777CGATTGAAAAAGCTTCGT77ATACATCAACCGACT7C7G7C7TAGGAAACAACCC77AACCATAG7TT7TCT 
DKF.N7KANFFEANMYIAF.0R I LCVELVSKR 



1890 



Mun I 



Acs I 
Apo I 



BseNi 

Bsr I 
Age I 
Bca77l 
BsaWI 
Cff 101 
PinAI 



hphl 



Acs I 
Apo I 



AATCACAA77GGG77C7 7AAA777G7 7AAAC7CGCAACCGG7CAAAC7GA7C7 7CC7CAAACAA77GCAGAATT7C777CCCAAAGACCA • 
— 1 . , , , , 1 j j p 19ao 

T7AC7GT7AACCCAAGAA77IAAACAAT77GACCG77GGCCAC777CAC7ACAAGGAC77TGT7AACGTC77AAAGAAAGCG777C7GC7 
N0NWVLKFVKLA7GE70VPE7 IAEFlSQRR 



Vspl 
Mbo II 
; Bbvl 
M Ban I 
HgiCI 



Msll 

i 



Acs! 
Apo I 



AGA7CGA77AA7GG7CCC7777T7GC7GC77TG7ACTCC77G7A7CAC777AGAAAAA7ATGCACGACTGACCA77CG7A7GC7AGAAAA 



I ■ i ■ l ■ I ' i ' t ' i ■ i i 

7C7ACC7AA77ACCACCGAAAAAACGACGAAACATGAGGAACA7AG7CAAA7C7T7T7ATACCTGC7GAC7GG7AACCA7ACGA7C7T77 

R V | NCAFFAALYSLYHFRK I W77DHSYARK 



2070 



All III Acs I 

Nsp I Apo I 
NspHI ECORI 



Mbo ll 
Muni 



7TTTCGC7ACATC7CCAAGAAr7CATT7A7CAAT7GG7A7CA7 7AT7G7777C ATTTT777C7T7GAGTAA777C7ATT7AACA77TTAT 
AAAACCGA7G7ACAGCTTCTTAAG7AAA7AG7TAACCATAG7AArAACAAAACTAAAAAAACAAACTCATTAAACATAAA7TG7AAAATA 
FWLHVE EF I YQLVSLLFSFFSLSNFYL7FY 
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FIG. 1E 
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TTTTT CACAGCTTCATTGCTCTCTTACAAAACTCTTCGTAAAAAACCTCGATTTTCCATTTTCACATTATTCAATTATC7CTCTAKCCT ^ 
AAAAACTGTCCAAGTAACCACAGAArGTTTTCAGAACCATTTTTTCCACCTAAAACCTAAAAGTGTAATAAGTTAATAGAGACATAGCCA 
t L T G S L V S r K S L G t K C C F W ., F T L F H Y L C ■ C 

PpulOl SlaNl 

! Nspl ! Sspl HinD l 

Alw26l ! NspHI I Asp 700 Hinc U 

BsmAI ! ; N»I ! LXmnl Vspl ^pal Ndei 

CTTTTCA CATCTTTCTTCATTGTCTCCATTGGT AATAGACCACATGCATCAAAGAATATTTTCAAAACA lTAATCArATTGTTAACCATA ^ 
CAAAACT 



^TCTAGAAACAACTAAC ACACCTAACCATTATC TCCTCTACCTACTTTCTTATAAAAGITTTGTAATTACTATAACAATTGGTAT 



IVS1CNRPHA 



SKN1FKTLI1LLTI 



Age I 

BC377I AIw21 I 

BsaW I AspH I 

Cfr 101 ' Bsi HKA I 

PinA I HgiA I 



TCTGC ATTATACCCATTCCTGGTTGGATTTCrCTTTGTTATCAATACTATTGCTA CTTTTCGAACCCCTGGAACATCTACCTATCTGCTC ^ 
acacctaatatgcgt AACCACCAACC TAAAC ACAAACAATAGTTATGAT AACGATGAAAACCTTGGCCACC TTGTAGATGGATACACGAG 



A L 



C F V F V 1 N 7 I A T F G T C G T $ T Y V L 

All IN 

Nsp I , . 
NspH I 
! HinD II 
= Hinc II 



Osa I 
Nco I 

Xcm I Sty I 



G TTAGTATTGTGGTTTCA7TGTTGTCCACCTATGGTC TTTAT ACGTTAATGTCCATTT TGTACTTGG ACCC A7GGC ACATGTTGACTTGT ^ 

CAATCA T AAC ACC AAAGT AAC AAC AGCTCGA TACC AG AAA T ATGC AATT AC AGG T AAAACATGAACCTCGG T ACC CTC7 ACAACTGAACA 
y S I V V S L L S T Y C L Y T L n S » L Y L 0 P W H II L T C 

o 4j*t i Alw26 I 

BspU07l g*'* 071 c sol BsmA I 

BsfGI : Bs,GI Sspl 

TCTCTACAATACTTTT7GATGATTCCArCGTACAC7TGTACATTACAAATATTTGCATTTTGTAA TACTCACGATGTCTCGTCGGCT,ACA ^ 

AGACATCTTAIGAAAAACTACTAACGTAGCATGTCAACATGTAATGTTTAIAAACGTAAAACATTATCAGTGCTAC AGAGCACCCCATGt 
S V 0 t F U n . P S r T C T L 0 I F A F C N T H 0 V S W C T 

BseN I ACS I 

Hphl Vboil p* rl A* 0 ' 

aaaggtgacaacaatcUaaagaacatttcagtaatcagtacattattcagaaaaatgccactc gacaatttcagcctgttattgttgat ^ 

TTTCCACTGTIGTTACCTTTTCTTCTAAACTCATTAGTCATGTAATAACTCTTTTTACCGTCACCTCTTAAAC'CCCACAATAACAACTA 
K G D N N P K £ D L S N 0 T ! I E K N A S C £ F E A V I V 0 

Cla I Alw26 I Mbo II BsmA I 

acaaatatccatcaacattaccttcacacattatataatatcaggtcaaacagatcaaacaaa aaaciggctttgggccattctgaaaag 27go 

TCTTTATACCTaCTTCTAAIGGAACTCTCTAATATATTAIACTCCAGTTTCTCTACTITGTTTTTTCACCCAAACCCCCTAACACTTTTC 
T N I 0 £ 0 V L £ T L r H I R S K « 5 N K K V A L C H S £ K 
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(SEQ ID NO: 4) 

10 20 

Mill I I I I I ' » ■ t I t t . i I , ■ . . ( 



30 40 
J — . i ■ . I . . . . I 



ATAATCGTTG TGCTACTGGT AGCTAGtTTC TGCTCTCTCA 40 
CTATAxGGTC tTAGTGTTGA CTGTCATGTC GATCAAGTTA 80 
CTTACAGGTA AATTATTGAG TTTCAATAAG GTTGGTTTCG 120 
TTGTGGCTAG I I I 1 1 ICGAT GTTTTACAAA ATGAAAAAAA 160 
ACTTAATACA TTTAAGCCAA CAGCTTATTG TAGGTGCTCC 200 
210 220 230 240 



TTTCATTATT CGTACTTCCT ACCCCATGGA GTTTAAAATG 240 
ATAAYYGAAA TTTAAAGCCA ACTAGCCAAC TAGCCAACTA 280 
GCCAGCtagC MAGMCAAgAC AAAACTAATC ACAAAGACTA 320 
AAAGAAAGTG TAGTTATAAA TCATTGCGAG AATTATTGCG 360 
AAAxGATATT CCGCTTTTCA AAAAAACATT ATTGCGAAAA 400 

410 420 430 440 

' ' i i I I i 1 1 I — i ■ i . I ■ . ■ ■ I t i . ■ 1 1 i i ■ I i i i i l i i i 1 1 

TCATTGCxGA xGAAAGGGGG AGTTATTTTT GGGGTACTAC 440 
TATGCATGTG TTGTTGTCAA TGTCTACCAC AAAAAGGGGC 480 
TTCTTTCAAT TGATAAACCT ACCAAAACAT CTGGTAATCA 520 
AAAGCTACTT GTGTGAGACT ATATTTATTG TAGATTACAC 560 
CCCGCTCTAC AAAGTTACCA TGAAGACAAA ACAACTTGTT 600 
610 620 630 640 

' 1 1 1 I '' I I I 1 I I r I I I ■ I I > ■ ■ ■ I t i l l I i l r . I ■ ■ . . I 

TGAAGTTATA TGAATCGATG TTAAAaATCT GCGTCTCGTG 640 
GAGAGTAACT TGATTATGTT AGGTCTGCTA TCGTTTATAC 680 
TATGACCGCA TCATATACAG GACATTAGAG CATCCTAAAT 720 
TAAATCATCC CATTGTTTCA AG I I TO I I G TTTAGCAAAG 760 
AGACAGTTCC AACTTGTTGT CGTCATAATT ATCGGAATAA 800 



i i i i.l 



FIG. 3A 
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810 820 830 840 

, | , ■ I ■ . ■ ■ | | I I I ' I I I I ' ■ ■ ■ ■ I ■ ■ ■ I I I I I I I M I . I 

TTTAAGCGAG GAAAAGTTGT GAAACAAATT GAAGAGTGGA 840 
GTGTGGGGGA GGGGGAGGGA AACAAGGAAG TATACCTCCA 880 
CCAAGTAGAA CCCAAATACT CCACGTAATC AACAACAAGT 920 
AGCCATATAA TTCAAAATTT GTAGTAGTTg GGCAAATAAT 960 
ATTTATACCC CCCCACTCCC CCAACCTTCC AATTTTCCTC 1000 

1010 1020 1030 1040 

, ■■■ i .... I • ■■■ i I ■ ■ ■ ■ I ■ i ■ ■ I ii i i 1 1 1 il l — 
TTCCTCTGGG AAI IIUMI TTTGAAATAC AAATCTCTTT 1040 
TAAAACCAAC TTAAACCTAT TAATTATGAC AATTGAATAT 1080 
ACTTGGTGGA AAGACGCTAC TATTTATCAA ATTTGGCCTG 1120 
CTTCATATAA AGATTCCAAT GGTGATGGAA TTGGTGATAT 1160 
TCCAGGGATA ATTTCTACAT TAGATTATCT TAAAAATTTA 1200 

1210 1220 1230 1240 

, , , ■ I ■ ■ ■ ■ I I .... I ■ ■ ■ ■ 1 miiIimiI 

GGAATTGATA TTATTTGGTT AAGTCCAATG TATAAATCCC 1240 
CTATGGAAGA TATGGGTTAT GATATTAGTG ATTATGAATC 1280 
TATAAATCCT GATTTTGGTA CTATGGAAGA CATGCAAAAT 1320 
TTAATTGATG GATGTCATGA AAGAGGAATG AAAATTATTT 1360 
GTGATTTAGT AGTTAATCAT ACATCATCTG AACATGAATG 1400 

1410 1420 1430 1440 

i .... i i .... I .... 1 1 1 1 ■ I i i i i I i i i 1 1 

GTTTAAACAA TCAAGATCAC TGAAATCAAA CCCTAAAAGA 1440 
GATTGGTATA TTTGGAAACC ACCGAGAATT GACGCxAAAA 1480 
ACTGGTGxAA AAATTACCAC CAAATAATTG GGGGTCATTT 1520 
TTTTCAGGAT CAGCATGGGA TATGATGAAT TAACCGATGA 1560 
aTATTATTTA AGaTTATTTG CCAAGGGACA ACCTGATTTA 1600 

1610 1620 1630 1640 

.... i ■ ■■■ I ■ ■■. i ■ ■■■ I .... I ... ■ I . . ■ 1 1 i ■ ■ 1 1 
AATTGGGAAA ATGAAGAAAG TCGTCAAGCA ATTTATAATT 1640 
CTGCCATGAA ATCATGGTTT GATAAAGGTG TTGATGGATT 1680 
TAGAATTGAT GTTGCTGGAT XATATTCTAA AGATCGACCT 1720 
CxGAATCAAA GGAA 1734 
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